1
00:00:00,000 --> 00:00:00,120

2
00:00:00,120 --> 00:00:02,500
The following content is
provided under a Creative

3
00:00:02,500 --> 00:00:03,910
Commons license.

4
00:00:03,910 --> 00:00:06,950
Your support will help MIT
OpenCourseWare continue to

5
00:00:06,950 --> 00:00:10,600
offer high-quality educational
resources for free.

6
00:00:10,600 --> 00:00:13,500
To make a donation, or view
additional materials from

7
00:00:13,500 --> 00:00:18,130
hundreds of MIT courses, visit
MIT OpenCourseWare at

8
00:00:18,130 --> 00:00:20,110
ocw.mit.edu.

9
00:00:20,110 --> 00:00:28,270
PROFESSOR: Welcome to 6.172,
Performance Engineering of

10
00:00:28,270 --> 00:00:31,520
Software Systems.

11
00:00:31,520 --> 00:00:35,050
This is a fun class.

12
00:00:35,050 --> 00:00:38,900
It's all about going fast.

13
00:00:38,900 --> 00:00:40,590
We all like to go fast--

14
00:00:40,590 --> 00:00:42,930
well, not all of us.

15
00:00:42,930 --> 00:00:46,540
But everybody in this class
likes to go fast.

16
00:00:46,540 --> 00:00:48,880
Otherwise you won't like
the class very much.

17
00:00:48,880 --> 00:00:50,835
It's all about making
stuff go fast.

18
00:00:50,835 --> 00:00:55,670

19
00:00:55,670 --> 00:00:57,770
So a couple of brief
announcements.

20
00:00:57,770 --> 00:01:01,230
We're going to be videotaping
for OpenCourseWare.

21
00:01:01,230 --> 00:01:05,360
So if you don't want to be
videotaped, you should sit it

22
00:01:05,360 --> 00:01:09,500
in a place where the camera
won't catch you.

23
00:01:09,500 --> 00:01:11,950
Because we're in such a large
classroom, I'd like everybody

24
00:01:11,950 --> 00:01:16,290
to sort of every day, without
me having to draw--

25
00:01:16,290 --> 00:01:18,690
I'm not really an artist.

26
00:01:18,690 --> 00:01:20,470
My sister's an artist,
actually.

27
00:01:20,470 --> 00:01:21,820
She's a painter.

28
00:01:21,820 --> 00:01:24,340
But maybe that gene
escaped me.

29
00:01:24,340 --> 00:01:27,280
It went that way and
not this way.

30
00:01:27,280 --> 00:01:29,390
So I don't want to have
to draw it up again.

31
00:01:29,390 --> 00:01:31,810
So everybody, try to confine
yourself to this sort of

32
00:01:31,810 --> 00:01:32,985
quarter of the classroom.

33
00:01:32,985 --> 00:01:36,860
And that way, it'll feel more
like a size room that

34
00:01:36,860 --> 00:01:38,110
we ought to be in.

35
00:01:38,110 --> 00:01:41,340

36
00:01:41,340 --> 00:01:51,060
Let me go to our course
information sheet.

37
00:01:51,060 --> 00:01:52,190
Did everybody get a copy of the

38
00:01:52,190 --> 00:01:55,750
course information handout?

39
00:01:55,750 --> 00:01:57,830
So we're going to the class--
we're going to start at the

40
00:01:57,830 --> 00:02:03,300
beginning with the Administrivia
stuff, try to

41
00:02:03,300 --> 00:02:05,610
run through that as
quickly as we can.

42
00:02:05,610 --> 00:02:11,009
And then my co-lecturer,
professor Saman Amarasinghe--

43
00:02:11,009 --> 00:02:13,290
if you can't pronounce or spell
his last name, you can

44
00:02:13,290 --> 00:02:14,540
just say "Saman"--

45
00:02:14,540 --> 00:02:16,660

46
00:02:16,660 --> 00:02:20,570
will give today's lecture.

47
00:02:20,570 --> 00:02:23,520
And we have four teaching
assistants.

48
00:02:23,520 --> 00:02:26,240
So teaching assistants,
please stand up.

49
00:02:26,240 --> 00:02:32,010
They are Josh, John,
Reed, and Eric.

50
00:02:32,010 --> 00:02:34,910
And let me tell you, what they
have done in the last few days

51
00:02:34,910 --> 00:02:37,340
to get us ready for this course
is nothing short of

52
00:02:37,340 --> 00:02:39,970
superhuman.

53
00:02:39,970 --> 00:02:43,880
So you really need to accord
them your respect, and they're

54
00:02:43,880 --> 00:02:46,060
going to really work hard
to help you folks do

55
00:02:46,060 --> 00:02:48,860
well in this class.

56
00:02:48,860 --> 00:02:50,630
We have, I think, a pretty
decent ratio.

57
00:02:50,630 --> 00:02:54,290
I think even with concerns about
cutbacks, et cetera, to

58
00:02:54,290 --> 00:02:58,350
have four TAs for this
is really good.

59
00:02:58,350 --> 00:03:00,870
And we need it, because
it's a software class.

60
00:03:00,870 --> 00:03:08,610
And software is, as you know,
bugs can take an arbitrarily

61
00:03:08,610 --> 00:03:09,860
long time to find.

62
00:03:09,860 --> 00:03:14,060

63
00:03:14,060 --> 00:03:15,180
You guys can sit.

64
00:03:15,180 --> 00:03:20,660
Besides our teaching assistants,
we have two

65
00:03:20,660 --> 00:03:24,770
support staff people helping us,
Mary McDavitt, who works

66
00:03:24,770 --> 00:03:28,700
for Saman, and Marcia Davidson,
who works for me.

67
00:03:28,700 --> 00:03:31,100
And they're generally pretty
good about being able to track

68
00:03:31,100 --> 00:03:36,490
us down if we're in
some far corner of

69
00:03:36,490 --> 00:03:39,760
the world or something.

70
00:03:39,760 --> 00:03:43,680
Our lectures are going to be
held on Tuesdays and Thursdays

71
00:03:43,680 --> 00:03:46,950
in this room, 2:30 to 4:00 PM.

72
00:03:46,950 --> 00:03:48,650
You should plan to
attend regularly.

73
00:03:48,650 --> 00:03:53,720
We dispense all kinds of
wonderful gems that you will

74
00:03:53,720 --> 00:03:57,650
miss if you think you're going
to get it just from reading

75
00:03:57,650 --> 00:04:01,660
and even looking at videotapes
or whatever.

76
00:04:01,660 --> 00:04:03,300
I'm not sure what our
arrangement is going to be

77
00:04:03,300 --> 00:04:05,640
about having videos available.

78
00:04:05,640 --> 00:04:11,580
OK, they're not going to be
available, except unless

79
00:04:11,580 --> 00:04:13,730
there's something really
extenuating circumstance.

80
00:04:13,730 --> 00:04:20,670
You have to go into a hospital
to have your forebrain removed

81
00:04:20,670 --> 00:04:23,150
or something--

82
00:04:23,150 --> 00:04:24,400
something serious, OK?

83
00:04:24,400 --> 00:04:27,450

84
00:04:27,450 --> 00:04:29,990
All material in the lecture,
even if it's delivered oral,

85
00:04:29,990 --> 00:04:32,340
is fair game for projects
and quizzes.

86
00:04:32,340 --> 00:04:36,360
We will post the lecture
slides, but they don't

87
00:04:36,360 --> 00:04:40,030
generally convey all
of the information.

88
00:04:40,030 --> 00:04:40,770
And we're going to be

89
00:04:40,770 --> 00:04:42,200
video-recorded, as I mentioned.

90
00:04:42,200 --> 00:04:45,970
If you don't want to appear, sit
behind the camera, and you

91
00:04:45,970 --> 00:04:47,760
won't appear.

92
00:04:47,760 --> 00:04:51,020
We have no recitations
in this class.

93
00:04:51,020 --> 00:04:53,490
However, the teaching assistants
will periodically

94
00:04:53,490 --> 00:04:59,270
hold primers where we will
bring you up to speed on

95
00:04:59,270 --> 00:05:02,620
various technologies that you
may not be up to speed in,

96
00:05:02,620 --> 00:05:07,770
such as C. How many people have
programmed in C before?

97
00:05:07,770 --> 00:05:10,075
And how many have not programmed
in C before?

98
00:05:10,075 --> 00:05:10,820
A few.

99
00:05:10,820 --> 00:05:12,070
OK.

100
00:05:12,070 --> 00:05:13,980

101
00:05:13,980 --> 00:05:18,590
Tools such as VTune.

102
00:05:18,590 --> 00:05:19,880
Who's used VTune before?

103
00:05:19,880 --> 00:05:22,790

104
00:05:22,790 --> 00:05:25,340
Who has not used VTune?

105
00:05:25,340 --> 00:05:26,590
Good.

106
00:05:26,590 --> 00:05:28,560

107
00:05:28,560 --> 00:05:31,780
Pin-- who's used Pin?

108
00:05:31,780 --> 00:05:32,920
OK.

109
00:05:32,920 --> 00:05:34,800
Who has used Cilk++?

110
00:05:34,800 --> 00:05:37,930

111
00:05:37,930 --> 00:05:39,510
OK, a couple people.

112
00:05:39,510 --> 00:05:42,275
So you see, we're going to have
some primers to bring you

113
00:05:42,275 --> 00:05:43,620
up to speed in those
kinds of things.

114
00:05:43,620 --> 00:05:46,800
We don't expect that you came
in knowing any of those

115
00:05:46,800 --> 00:05:48,050
things, particularly.

116
00:05:48,050 --> 00:05:51,020

117
00:05:51,020 --> 00:05:55,290
We're going to be using
Stellar for our course

118
00:05:55,290 --> 00:05:56,590
management.

119
00:05:56,590 --> 00:05:58,400
And here's the URL.

120
00:05:58,400 --> 00:06:01,240
And you should make sure that
you're registered as a member

121
00:06:01,240 --> 00:06:04,100
of 6.172, because otherwise,
you're not going to get any

122
00:06:04,100 --> 00:06:04,700
announcements.

123
00:06:04,700 --> 00:06:05,580
We're going to do all of our

124
00:06:05,580 --> 00:06:07,400
administration through Stellar.

125
00:06:07,400 --> 00:06:10,040
So make sure that you're
registered there.

126
00:06:10,040 --> 00:06:13,670

127
00:06:13,670 --> 00:06:17,140
At the end of today, I think
it is, we'll send out an

128
00:06:17,140 --> 00:06:21,950
announcement to the student
mailing list.

129
00:06:21,950 --> 00:06:25,330
If you don't get something
by the end today--

130
00:06:25,330 --> 00:06:27,660
so you wake up tomorrow morning,
you have no email

131
00:06:27,660 --> 00:06:29,400
from the course staff.

132
00:06:29,400 --> 00:06:32,592
What that means is you
should email us.

133
00:06:32,592 --> 00:06:34,632
AUDIENCE: Well, that mailing
list would be the

134
00:06:34,632 --> 00:06:35,500
preregistered ones.

135
00:06:35,500 --> 00:06:37,520
PROFESSOR: That'd be the
preregistered ones.

136
00:06:37,520 --> 00:06:39,420
And then that way, we'll
get all the people.

137
00:06:39,420 --> 00:06:41,710
So if you know you're not
preregistered, you can send us

138
00:06:41,710 --> 00:06:42,960
email before then.

139
00:06:42,960 --> 00:06:45,650

140
00:06:45,650 --> 00:06:49,190
There are two one-hour quizzes
given during class time.

141
00:06:49,190 --> 00:06:50,450
AUDIENCE: One and a half.

142
00:06:50,450 --> 00:06:51,290
PROFESSOR: What's that?

143
00:06:51,290 --> 00:06:52,520
AUDIENCE: One and a half hour.

144
00:06:52,520 --> 00:06:53,980
PROFESSOR: Oh, you're right.

145
00:06:53,980 --> 00:06:56,530
It's one and a half
hour quizzes.

146
00:06:56,530 --> 00:06:58,440
Yes, two one and a half
hour quizzes.

147
00:06:58,440 --> 00:07:00,150
We'll correct that in
the online version.

148
00:07:00,150 --> 00:07:03,420

149
00:07:03,420 --> 00:07:06,190
Given during class time.

150
00:07:06,190 --> 00:07:08,300
Can somebody, by the way,
correct this as we go along?

151
00:07:08,300 --> 00:07:10,780
Great.

152
00:07:10,780 --> 00:07:13,120
The dates are on the course
calendar, which is available

153
00:07:13,120 --> 00:07:14,460
on Stellar.

154
00:07:14,460 --> 00:07:19,850
We've had mixed results with the
ICS feed from Stellar, but

155
00:07:19,850 --> 00:07:24,075
you're supposed to be able to
download the calendar to your

156
00:07:24,075 --> 00:07:29,040
iPhone or other ICS-capable
devices.

157
00:07:29,040 --> 00:07:33,590
But we were successful in
downloading, but all the times

158
00:07:33,590 --> 00:07:35,620
were screwed up.

159
00:07:35,620 --> 00:07:38,750
But anyway, hopefully, maybe
they'll allow that.

160
00:07:38,750 --> 00:07:40,870
But anyway, the calendar
is there.

161
00:07:40,870 --> 00:07:44,940
The quizzes will generally be
closed book and closed notes,

162
00:07:44,940 --> 00:07:48,970
but you will be permitted crib
sheets, one sheet that you

163
00:07:48,970 --> 00:07:52,340
summarize anything that you feel
that you need to know.

164
00:07:52,340 --> 00:07:54,930

165
00:07:54,930 --> 00:07:58,760
Generally, everything that we
cover in here is fair game for

166
00:07:58,760 --> 00:08:00,770
the quizzes, including
things that are in

167
00:08:00,770 --> 00:08:02,930
prerequisite courses.

168
00:08:02,930 --> 00:08:07,100
There is no final exam because
we have a final project.

169
00:08:07,100 --> 00:08:12,490
This is a fast-moving class, so
if you get caught behind,

170
00:08:12,490 --> 00:08:16,680
it's not a good sign, because
it's really hard to work

171
00:08:16,680 --> 00:08:18,820
double-hard to keep
up in this class.

172
00:08:18,820 --> 00:08:21,120
You really have to keep
on top of things.

173
00:08:21,120 --> 00:08:23,840
And so for that reason
and others--

174
00:08:23,840 --> 00:08:26,440
because it's really
difficult for us--

175
00:08:26,440 --> 00:08:28,370
it's fast moving for
us as well--

176
00:08:28,370 --> 00:08:30,390
we generally cannot accept
late projects.

177
00:08:30,390 --> 00:08:33,030
If you haven't been able to
complete a project, you should

178
00:08:33,030 --> 00:08:37,100
just package up your partial
solution and hand that in, and

179
00:08:37,100 --> 00:08:39,970
you'll get partial
credit on it.

180
00:08:39,970 --> 00:08:44,130
Do not just say, oh, I'm not
going to hand something in,

181
00:08:44,130 --> 00:08:47,060
and expect it to get
it done later.

182
00:08:47,060 --> 00:08:50,360
If you do find yourself in an
unusual situation which you

183
00:08:50,360 --> 00:08:52,960
think there are some extenuating
circumstances,

184
00:08:52,960 --> 00:08:56,820
like the aforementioned brain
surgery or something, let us

185
00:08:56,820 --> 00:08:58,040
know in advance.

186
00:08:58,040 --> 00:09:00,150
So if you're going to get in
a car accident, it's really

187
00:09:00,150 --> 00:09:04,810
helpful if you tell us the day
before so that we can plan.

188
00:09:04,810 --> 00:09:07,070
Obviously can't always do that,
but if it is something

189
00:09:07,070 --> 00:09:09,060
for which there is any
reasonable expectation it

190
00:09:09,060 --> 00:09:13,980
should be done beforehand,
you should let us know.

191
00:09:13,980 --> 00:09:17,870
And we may require, for those
kinds of exceptions, some kind

192
00:09:17,870 --> 00:09:22,470
of confirmation from a dean
or medical professional.

193
00:09:22,470 --> 00:09:26,730
So if that's the route that
you're in, you can plan a

194
00:09:26,730 --> 00:09:30,190
little bit and make sure that
those confirmations

195
00:09:30,190 --> 00:09:32,510
are on their way.

196
00:09:32,510 --> 00:09:36,530
The grading, this is essentially
approximately how

197
00:09:36,530 --> 00:09:39,090
we're going to grade.

198
00:09:39,090 --> 00:09:42,900
As you can see, there's a
bunch of content in the

199
00:09:42,900 --> 00:09:46,360
projects and a bunch
in the quizzes.

200
00:09:46,360 --> 00:09:49,970
And then there's also
participation grade.

201
00:09:49,970 --> 00:09:53,970
And the participation
is both in--

202
00:09:53,970 --> 00:09:57,090
we will have design reviews and
such, so that's part of

203
00:09:57,090 --> 00:09:59,030
the participation,
as well as asking

204
00:09:59,030 --> 00:10:01,150
questions and such in class.

205
00:10:01,150 --> 00:10:03,320
And because we're
video-recording everything, we

206
00:10:03,320 --> 00:10:05,990
know exactly who's asking
questions, unless you choose

207
00:10:05,990 --> 00:10:09,410
to sit outside the video zone.

208
00:10:09,410 --> 00:10:13,760

209
00:10:13,760 --> 00:10:21,470
In addition, if there is a
significant missing element to

210
00:10:21,470 --> 00:10:23,920
your work, it doesn't matter how
well you did on the other

211
00:10:23,920 --> 00:10:26,070
things, you will receive
a failing grade.

212
00:10:26,070 --> 00:10:30,390
You must, essentially,
do all the work.

213
00:10:30,390 --> 00:10:35,490
So if you do not get substantial
credit on the

214
00:10:35,490 --> 00:10:39,450
final project or any other two
assignments, for example, you

215
00:10:39,450 --> 00:10:42,110
can expect to receive
a failing grade.

216
00:10:42,110 --> 00:10:44,660
It goes downhill very quickly.

217
00:10:44,660 --> 00:10:46,330
If you're missing one
assignment, generally don't

218
00:10:46,330 --> 00:10:50,320
expect to get higher than
a C, for example.

219
00:10:50,320 --> 00:10:54,230
It goes downhill very, very
quickly, because that's what

220
00:10:54,230 --> 00:10:55,480
this is all about is hands on.

221
00:10:55,480 --> 00:11:00,120

222
00:11:00,120 --> 00:11:03,430
What's permissible
in this class?

223
00:11:03,430 --> 00:11:06,340
So it's your responsibility to
satisfy both the letter and

224
00:11:06,340 --> 00:11:08,200
spirit of the rules.

225
00:11:08,200 --> 00:11:10,380
So if you have any questions,
let us know.

226
00:11:10,380 --> 00:11:12,210
Here's what the rules are.

227
00:11:12,210 --> 00:11:15,680
Also, let me just say how
Saman and I treat this.

228
00:11:15,680 --> 00:11:19,730
We have been involved in
cheating incidents off and on

229
00:11:19,730 --> 00:11:21,020
over the years.

230
00:11:21,020 --> 00:11:23,180
I've been at MIT almost
30 years.

231
00:11:23,180 --> 00:11:24,823
Saman, you've been here--

232
00:11:24,823 --> 00:11:25,276
PROFESSOR: 13.

233
00:11:25,276 --> 00:11:27,230
PROFESSOR: 13 years.

234
00:11:27,230 --> 00:11:29,720
So we've been here long time,
longer than you folks.

235
00:11:29,720 --> 00:11:32,360
We've been through this
a bunch of times.

236
00:11:32,360 --> 00:11:34,510
It's really a pain when
somebody cheats.

237
00:11:34,510 --> 00:11:39,390
It robs me of a huge amount of
my time that I could otherwise

238
00:11:39,390 --> 00:11:44,000
be spending on teaching and
students and my own

239
00:11:44,000 --> 00:11:46,000
professional development.

240
00:11:46,000 --> 00:11:51,250
However, we both take cheating
incidents extremely seriously.

241
00:11:51,250 --> 00:11:55,950
So we have this thing that is
sometimes really inconvenient

242
00:11:55,950 --> 00:11:56,970
called integrity.

243
00:11:56,970 --> 00:11:59,400
And I'd rather, if I find
somebody's cheating, just give

244
00:11:59,400 --> 00:12:01,970
you an F and not deal with it.

245
00:12:01,970 --> 00:12:03,910
Drat.

246
00:12:03,910 --> 00:12:07,510
Instead, when somebody's
cheating, I tend to want to

247
00:12:07,510 --> 00:12:10,750
take it to the Committee on
Discipline, where you face

248
00:12:10,750 --> 00:12:12,810
things like expulsion,
et cetera.

249
00:12:12,810 --> 00:12:14,490
So I don't like to do that.

250
00:12:14,490 --> 00:12:15,830
It's very time consuming
for me.

251
00:12:15,830 --> 00:12:21,450
But I find it hard to just look
the other way, because I

252
00:12:21,450 --> 00:12:25,900
view the issue of integrity as
protecting the vast majority

253
00:12:25,900 --> 00:12:30,790
of students who are
obeying the rules.

254
00:12:30,790 --> 00:12:36,150
So when a cheating incident
occurs, I sort of feel honor

255
00:12:36,150 --> 00:12:39,500
bound to push it to its
proper conclusion.

256
00:12:39,500 --> 00:12:42,020

257
00:12:42,020 --> 00:12:45,660
So please don't cheat.

258
00:12:45,660 --> 00:12:49,250
We will use a variety of means
at our disposal, including

259
00:12:49,250 --> 00:12:52,110
technological ones, to
detect cheating.

260
00:12:52,110 --> 00:12:55,140
Also, if you think that sharing
your results with

261
00:12:55,140 --> 00:12:57,640
somebody else means
that they're

262
00:12:57,640 --> 00:12:59,310
cheating and you're not--

263
00:12:59,310 --> 00:13:01,650
in some cultures that's
the way it's treated--

264
00:13:01,650 --> 00:13:02,900
no.

265
00:13:02,900 --> 00:13:04,690

266
00:13:04,690 --> 00:13:08,200
In this culture, in particular
the culture in this classroom,

267
00:13:08,200 --> 00:13:12,010
they're both equally guilty of
academic dishonesty, whether

268
00:13:12,010 --> 00:13:16,880
you are the giver
or the taker.

269
00:13:16,880 --> 00:13:21,380
Equally, we treat that
as equal offense.

270
00:13:21,380 --> 00:13:22,770
So here's the particular
things.

271
00:13:22,770 --> 00:13:27,760
You may not share, generally,
your solutions with people who

272
00:13:27,760 --> 00:13:29,630
are not in your group.

273
00:13:29,630 --> 00:13:31,310
So we will have a bunch
of group projects,

274
00:13:31,310 --> 00:13:32,560
generally in pairs.

275
00:13:32,560 --> 00:13:37,020

276
00:13:37,020 --> 00:13:42,030
And that includes both people
in the class and people

277
00:13:42,030 --> 00:13:44,590
outside the class.

278
00:13:44,590 --> 00:13:47,840
So generally, you keep
things to yourself.

279
00:13:47,840 --> 00:13:49,960
When you're in a group, of
course, you can share ideas

280
00:13:49,960 --> 00:13:51,930
within the group.

281
00:13:51,930 --> 00:13:56,440
But we expect that everybody's
making a fair contribution.

282
00:13:56,440 --> 00:13:59,740
So you should be concerned if
you are not holding up your

283
00:13:59,740 --> 00:14:01,630
end of your group
participation.

284
00:14:01,630 --> 00:14:04,300

285
00:14:04,300 --> 00:14:07,150
And we're going to ask you to
describe the contributions of

286
00:14:07,150 --> 00:14:08,150
the people in your group.

287
00:14:08,150 --> 00:14:10,600
Generally, the group,
as I say, is two.

288
00:14:10,600 --> 00:14:13,600
So it's basically saying,
what did you do?

289
00:14:13,600 --> 00:14:14,970
How did you divide
up your project?

290
00:14:14,970 --> 00:14:15,470
What did you do?

291
00:14:15,470 --> 00:14:18,450
What did the other person do?

292
00:14:18,450 --> 00:14:21,860
You generally can't share
it with anybody else.

293
00:14:21,860 --> 00:14:26,150
You can't copy or transcribe
solutions from other places.

294
00:14:26,150 --> 00:14:27,880
The work you submit
must be your own.

295
00:14:27,880 --> 00:14:29,510
Pretty straightforward.

296
00:14:29,510 --> 00:14:33,380
You may go out and use general
conceptual material, such that

297
00:14:33,380 --> 00:14:38,190
you might get from a good
textbook or from Wikipedia or

298
00:14:38,190 --> 00:14:39,120
someplace like that.

299
00:14:39,120 --> 00:14:43,000
That's perfectly fine, resources
you would ordinarily

300
00:14:43,000 --> 00:14:44,610
have available.

301
00:14:44,610 --> 00:14:49,220
But if you do use any material
from an external source, you

302
00:14:49,220 --> 00:14:56,590
should properly cite it in
normal academic that fashion.

303
00:14:56,590 --> 00:14:59,360
Cite where it is so that
somebody else can go and find

304
00:14:59,360 --> 00:15:02,830
and look to see what it
is that you're citing.

305
00:15:02,830 --> 00:15:07,670
If you feel that you have
transgressed in

306
00:15:07,670 --> 00:15:10,160
some way, let us know.

307
00:15:10,160 --> 00:15:12,840
It always goes way easier.

308
00:15:12,840 --> 00:15:16,380
I generally tend to be kind of a
nice guy when somebody comes

309
00:15:16,380 --> 00:15:22,260
to me saying, oops, verses I
hear from somewhere else or

310
00:15:22,260 --> 00:15:26,340
one of our tools detects
something else.

311
00:15:26,340 --> 00:15:28,910
Then I'm kind of cranky.

312
00:15:28,910 --> 00:15:31,950
So I'd rather, if you come to
me, we could work something

313
00:15:31,950 --> 00:15:35,070
out, generally.

314
00:15:35,070 --> 00:15:39,600
Most of your out-of-class
time will spent

315
00:15:39,600 --> 00:15:41,820
completing six projects.

316
00:15:41,820 --> 00:15:44,110
And there's an outline of
what the projects are.

317
00:15:44,110 --> 00:15:47,710
They're all fun projects.

318
00:15:47,710 --> 00:15:48,870
You can ask the TAs.

319
00:15:48,870 --> 00:15:51,930
The TAs will tell you how much
fun they are, because they've

320
00:15:51,930 --> 00:15:56,660
generally helped to design
and built their own

321
00:15:56,660 --> 00:15:58,550
solutions to them.

322
00:15:58,550 --> 00:16:00,420
Some will be done in pairs,
some individually.

323
00:16:00,420 --> 00:16:04,340
The final project can either be
done in pairs or in threes,

324
00:16:04,340 --> 00:16:05,910
I think we said.

325
00:16:05,910 --> 00:16:06,950
AUDIENCE: [INAUDIBLE]

326
00:16:06,950 --> 00:16:09,490
PROFESSOR: Yes, we also have--
we didn't put that in here.

327
00:16:09,490 --> 00:16:16,960
We also have a Project 0, which
will be available today

328
00:16:16,960 --> 00:16:17,970
after class.

329
00:16:17,970 --> 00:16:21,170
So you should go to Stellar
and get Project 0.

330
00:16:21,170 --> 00:16:24,690
Project 0 is not to
be handed in.

331
00:16:24,690 --> 00:16:28,160
It's getting warmed up with all
of the machines and tools

332
00:16:28,160 --> 00:16:29,800
that we'll be using
in the class.

333
00:16:29,800 --> 00:16:33,730
We have been very fortunate to
have very nice contributions

334
00:16:33,730 --> 00:16:46,028
from Dell and Intel of a new
Cluster facility that has 16

335
00:16:46,028 --> 00:16:47,278
12-core machines.

336
00:16:47,278 --> 00:16:50,740

337
00:16:50,740 --> 00:16:56,370
And they're Intel Core
i7 machines.

338
00:16:56,370 --> 00:17:01,620
Each chip has six cores, and
there's two cores per machine,

339
00:17:01,620 --> 00:17:06,790
each with 48 gigabytes
of memory.

340
00:17:06,790 --> 00:17:11,400
So these are kind of honking
modern facility.

341
00:17:11,400 --> 00:17:13,270
So as I say, it's fun.

342
00:17:13,270 --> 00:17:16,589
It's fun when you do something
on your laptop, and then you

343
00:17:16,589 --> 00:17:20,160
go and you do it on one
of these machines and

344
00:17:20,160 --> 00:17:21,960
it goes like this.

345
00:17:21,960 --> 00:17:23,840
It's really fun when these
things go fast.

346
00:17:23,840 --> 00:17:26,960

347
00:17:26,960 --> 00:17:31,450
So Project 0, you should
start on today.

348
00:17:31,450 --> 00:17:37,130
And it's basically the handout
called Getting Started.

349
00:17:37,130 --> 00:17:39,890
It will be available
on Stellar.

350
00:17:39,890 --> 00:17:43,210
And it will take you through
getting access to these

351
00:17:43,210 --> 00:17:44,600
machines and so forth.

352
00:17:44,600 --> 00:17:48,220
In particular, in order to use
these machines, you must have

353
00:17:48,220 --> 00:17:51,040
a CSAIL account.

354
00:17:51,040 --> 00:17:54,480
Who does not have
a CSAIL account?

355
00:17:54,480 --> 00:17:57,950
So that is going to be the first
thing you need to do,

356
00:17:57,950 --> 00:17:59,040
because you've got
to get that.

357
00:17:59,040 --> 00:18:00,960
So what you want to do is get

358
00:18:00,960 --> 00:18:04,110
registered for a CSAIL account--

359
00:18:04,110 --> 00:18:05,250
and we give instructions
in there.

360
00:18:05,250 --> 00:18:08,200
It's basically going
to a web page--

361
00:18:08,200 --> 00:18:11,280
pretty much as soon as you
get out of the classroom.

362
00:18:11,280 --> 00:18:14,000
Because you won't be able to
start anything until you have

363
00:18:14,000 --> 00:18:15,910
this CSAIL account.

364
00:18:15,910 --> 00:18:18,530
And hopefully, we'll be able
to give you access within a

365
00:18:18,530 --> 00:18:19,780
few hours after you register.

366
00:18:19,780 --> 00:18:24,700

367
00:18:24,700 --> 00:18:25,310
Yeah, question?

368
00:18:25,310 --> 00:18:27,938
AUDIENCE: What operating
systems are the tools--

369
00:18:27,938 --> 00:18:28,810
PROFESSOR: Linux.

370
00:18:28,810 --> 00:18:31,380
We'll be using Linux.

371
00:18:31,380 --> 00:18:35,140
So there are a variety of ways
that we describe for how you

372
00:18:35,140 --> 00:18:37,860
use these machines remotely.

373
00:18:37,860 --> 00:18:40,400
We have the Andrew
File System--

374
00:18:40,400 --> 00:18:41,400
AFS--

375
00:18:41,400 --> 00:18:44,690
which you can use so that you
can edit things locally but

376
00:18:44,690 --> 00:18:50,010
have an SSH terminal window on
the machine, which makes it

377
00:18:50,010 --> 00:18:51,160
very convenient.

378
00:18:51,160 --> 00:18:54,420
On the other hand, for some
people, often some dorm rooms

379
00:18:54,420 --> 00:18:58,030
with poor connectivity, doing
things by AFS may be

380
00:18:58,030 --> 00:18:59,460
too slow for you.

381
00:18:59,460 --> 00:19:01,535
But there are a variety of other
ways that you can-- you

382
00:19:01,535 --> 00:19:03,680
can actually edit right
on the machine.

383
00:19:03,680 --> 00:19:07,470
And there are a variety of ways
of having a repository on

384
00:19:07,470 --> 00:19:10,510
your machine, because we'll
be using Git as

385
00:19:10,510 --> 00:19:15,730
our versioning mechanism.

386
00:19:15,730 --> 00:19:18,160
So you'd be able to have it
there and be able to make

387
00:19:18,160 --> 00:19:19,520
copies back on the
other machine.

388
00:19:19,520 --> 00:19:21,330
So there are a variety
of ways to work.

389
00:19:21,330 --> 00:19:24,530
And you're free to make the
software run on your own

390
00:19:24,530 --> 00:19:26,920
laptops or workstations.

391
00:19:26,920 --> 00:19:30,290
The one thing I'll say is we
will not support that.

392
00:19:30,290 --> 00:19:39,430
We don't have the resources to
support beyond using the

393
00:19:39,430 --> 00:19:42,310
machines that we actually
provide.

394
00:19:42,310 --> 00:19:45,970
I think I've got to
go faster here.

395
00:19:45,970 --> 00:19:51,000
So let's see.

396
00:19:51,000 --> 00:19:52,250
So software engineering.

397
00:19:52,250 --> 00:19:54,840

398
00:19:54,840 --> 00:19:58,380
So research in programmer
productivity has demonstrated

399
00:19:58,380 --> 00:20:02,080
that pair programming, where two
programmers sit together,

400
00:20:02,080 --> 00:20:07,460
one at the keyboard and one
looking on, is actually faster

401
00:20:07,460 --> 00:20:11,580
than two programmers
coding separately.

402
00:20:11,580 --> 00:20:14,010
Why do you suppose that is?

403
00:20:14,010 --> 00:20:15,021
Yeah?

404
00:20:15,021 --> 00:20:16,985
AUDIENCE: You don't get stuck.

405
00:20:16,985 --> 00:20:19,440
The other person can tell you--
it's like when you're

406
00:20:19,440 --> 00:20:21,895
[INAUDIBLE] writing on the
blackboard, you get nervous.

407
00:20:21,895 --> 00:20:22,877
You don't know what
you're doing.

408
00:20:22,877 --> 00:20:23,859
[INAUDIBLE]

409
00:20:23,859 --> 00:20:25,830
PROFESSOR: Yeah, you
could be stuck.

410
00:20:25,830 --> 00:20:26,160
What else?

411
00:20:26,160 --> 00:20:28,108
AUDIENCE: You don't
procrastinate.

412
00:20:28,108 --> 00:20:30,543
PROFESSOR: Yeah, you don't
procrastinate.

413
00:20:30,543 --> 00:20:31,620
That's a good one.

414
00:20:31,620 --> 00:20:32,260
Don't procrastinate.

415
00:20:32,260 --> 00:20:33,460
AUDIENCE: Finding bugs.

416
00:20:33,460 --> 00:20:34,730
PROFESSOR: Yeah, finding bugs.

417
00:20:34,730 --> 00:20:39,750
So the main one is finding bugs,
that the person who's

418
00:20:39,750 --> 00:20:41,430
looking on finds bugs.

419
00:20:41,430 --> 00:20:43,820
In fact, I was just pair
programming with Reed the

420
00:20:43,820 --> 00:20:46,070
other day, and we were trying
to figure out, why is this

421
00:20:46,070 --> 00:20:48,340
thing not working?

422
00:20:48,340 --> 00:20:50,750
And I say, gee, it looks fine.

423
00:20:50,750 --> 00:20:54,670
It says, next y equals x.

424
00:20:54,670 --> 00:20:58,590
And he says, no, next y is
supposed to equal y.

425
00:20:58,590 --> 00:21:01,810
And it's like, we had both read
over that code, but as

426
00:21:01,810 --> 00:21:04,080
soon as I spelled it out,
I didn't even notice.

427
00:21:04,080 --> 00:21:06,080
But of course, him in hearing it
says, oh, it's supposed to

428
00:21:06,080 --> 00:21:08,880
be x and y and y, not x and y.

429
00:21:08,880 --> 00:21:12,600
So anyway, that kind of thing,
you pick up very quickly.

430
00:21:12,600 --> 00:21:15,060
It makes it very
easy to debug.

431
00:21:15,060 --> 00:21:19,070
Likewise, regression testing
demonstrably promotes fast

432
00:21:19,070 --> 00:21:20,780
co-development.

433
00:21:20,780 --> 00:21:24,670
So this is where you build a
battery of tests that includes

434
00:21:24,670 --> 00:21:28,040
tests for every mistake
that you've ever made.

435
00:21:28,040 --> 00:21:31,030
And it can also be helpful to
write unit tests, where you

436
00:21:31,030 --> 00:21:32,410
test individual ones.

437
00:21:32,410 --> 00:21:36,190
And you should also use tools
like the assert package.

438
00:21:36,190 --> 00:21:40,060
And some of these things will be
taken through in Project 0.

439
00:21:40,060 --> 00:21:44,120

440
00:21:44,120 --> 00:21:45,910
The MIT POSSE.

441
00:21:45,910 --> 00:21:49,270
We have recruited senior
software engineers in the

442
00:21:49,270 --> 00:21:52,420
region to share with you their
invaluable knowledge and

443
00:21:52,420 --> 00:21:54,020
experience.

444
00:21:54,020 --> 00:21:57,480
We call them Masters In The
Practice Of Software Systems

445
00:21:57,480 --> 00:22:01,670
Engineering, which has the
great acronym MIT POSSE.

446
00:22:01,670 --> 00:22:04,580
These are people who
are not being paid.

447
00:22:04,580 --> 00:22:06,520
And they're senior.

448
00:22:06,520 --> 00:22:09,130
They're doing this because
they're volunteering, because

449
00:22:09,130 --> 00:22:11,420
they want to help.

450
00:22:11,420 --> 00:22:14,540
You have a lot to learn from
these people, and we will have

451
00:22:14,540 --> 00:22:17,060
code and design reviews with
these people with your

452
00:22:17,060 --> 00:22:18,240
assigned master.

453
00:22:18,240 --> 00:22:20,840
You want to treat them nicely.

454
00:22:20,840 --> 00:22:29,990
They're not doing this so that
some snotty-nosed kid can say,

455
00:22:29,990 --> 00:22:31,940
I don't care about you.

456
00:22:31,940 --> 00:22:32,770
That's not what they're doing.

457
00:22:32,770 --> 00:22:35,200
They're there to help you
get a good grade.

458
00:22:35,200 --> 00:22:37,995
And these are some really
talented people, people who

459
00:22:37,995 --> 00:22:46,500
are very famous in their
own areas of expertise.

460
00:22:46,500 --> 00:22:49,820
So be gracious, is the main
thing I would say.

461
00:22:49,820 --> 00:22:51,070
Be gracious.

462
00:22:51,070 --> 00:22:54,310

463
00:22:54,310 --> 00:22:56,970
Let's see.

464
00:22:56,970 --> 00:22:58,430
The assignments.

465
00:22:58,430 --> 00:22:59,920
You can read about
the assignments.

466
00:22:59,920 --> 00:23:01,810
Basically, generally, what we're
going to do is have a

467
00:23:01,810 --> 00:23:06,390
beta submission in which we're
going to test everything.

468
00:23:06,390 --> 00:23:11,390
And we're also going to
take all your tests.

469
00:23:11,390 --> 00:23:14,160
We're going to throw all the
tests into a big pool and run

470
00:23:14,160 --> 00:23:17,160
everybody's tests against
everybody software.

471
00:23:17,160 --> 00:23:20,730
You will lose points if you
fail some of the tests,

472
00:23:20,730 --> 00:23:22,430
assuming the tests are
correct tests.

473
00:23:22,430 --> 00:23:24,600
You will lose points if you
have an incorrect test.

474
00:23:24,600 --> 00:23:27,470
You will lose points if somebody
passes your test and

475
00:23:27,470 --> 00:23:29,490
they have a bug.

476
00:23:29,490 --> 00:23:32,310
You will get points if your test
picks up something that

477
00:23:32,310 --> 00:23:36,550
very few other groups
tested for.

478
00:23:36,550 --> 00:23:39,580
So you have to both submit code
and submit tests, which

479
00:23:39,580 --> 00:23:42,560
is OK, because you've got to
test your own stuff before you

480
00:23:42,560 --> 00:23:44,960
hand it in.

481
00:23:44,960 --> 00:23:48,370
But we're also going to be using
it to test each other.

482
00:23:48,370 --> 00:23:53,935
After beta, we will have a
design and code review with

483
00:23:53,935 --> 00:23:55,950
your master.

484
00:23:55,950 --> 00:23:58,470
And let me just say, the
master's there not affecting

485
00:23:58,470 --> 00:24:00,600
your grade in any way.

486
00:24:00,600 --> 00:24:03,060
They do give us reports about
what happened in their

487
00:24:03,060 --> 00:24:05,520
meeting, but they have no
bearing on the grade.

488
00:24:05,520 --> 00:24:06,790
They're just there to help.

489
00:24:06,790 --> 00:24:09,480

490
00:24:09,480 --> 00:24:14,190
So you don't have to
view them as--

491
00:24:14,190 --> 00:24:16,080
they're not part of
the course staff.

492
00:24:16,080 --> 00:24:17,770
If you want to get angry at
somebody about how things are

493
00:24:17,770 --> 00:24:20,320
done, get angry at
Saman and me.

494
00:24:20,320 --> 00:24:23,510
Don't get angry at them.

495
00:24:23,510 --> 00:24:27,790
The final submission is
generally due a couple weeks

496
00:24:27,790 --> 00:24:28,830
after the beta.

497
00:24:28,830 --> 00:24:31,340
And it's after you've gotten
feedback on your design,

498
00:24:31,340 --> 00:24:34,090
feedback on things.

499
00:24:34,090 --> 00:24:36,040
We'll give you the whole
battery of everybody's

500
00:24:36,040 --> 00:24:39,110
regression tests, so you can
test out your code against

501
00:24:39,110 --> 00:24:42,950
everybody's regression
test and use that.

502
00:24:42,950 --> 00:24:45,980
And then generally, the scoring
is such that people

503
00:24:45,980 --> 00:24:49,590
who get fast code
get high marks.

504
00:24:49,590 --> 00:24:51,840
People who get slow codes
get lower marks.

505
00:24:51,840 --> 00:24:55,040

506
00:24:55,040 --> 00:24:56,490
If you need help,
[UNINTELLIGIBLE].

507
00:24:56,490 --> 00:24:59,320
Speed is fun.

508
00:24:59,320 --> 00:25:00,570
Enjoy the class.

509
00:25:00,570 --> 00:25:03,600

510
00:25:03,600 --> 00:25:05,192
Yeah, question?

511
00:25:05,192 --> 00:25:08,690
AUDIENCE: In terms of
the bug testing, I'm

512
00:25:08,690 --> 00:25:10,121
thinking of bugs like--

513
00:25:10,121 --> 00:25:11,075
I don't know--

514
00:25:11,075 --> 00:25:12,029
[INAUDIBLE]

515
00:25:12,029 --> 00:25:14,891
or something like that.

516
00:25:14,891 --> 00:25:17,700
If you happen to not have made
a bug in your own program

517
00:25:17,700 --> 00:25:20,206
[INAUDIBLE]

518
00:25:20,206 --> 00:25:23,200
would that count against you
in the final testing.

519
00:25:23,200 --> 00:25:28,566
PROFESSOR: You have to view
your beta test as testing

520
00:25:28,566 --> 00:25:30,510
other peoples' code also.

521
00:25:30,510 --> 00:25:34,330
PROFESSOR: But if you
[UNINTELLIGIBLE] a test that

522
00:25:34,330 --> 00:25:39,000
finds bugs in somebody's
program, and if very few

523
00:25:39,000 --> 00:25:42,150
people wrote that test, then
you will get points.

524
00:25:42,150 --> 00:25:46,110
So you don't have to test for
the world, everything.

525
00:25:46,110 --> 00:25:48,940
If you fail somebody's test,
but not yours, you will be

526
00:25:48,940 --> 00:25:50,160
deducted points.

527
00:25:50,160 --> 00:25:53,490
If your test finds other
people's bugs, you will gain

528
00:25:53,490 --> 00:25:54,790
points for that.

529
00:25:54,790 --> 00:25:58,120
But on the other hand, if you
don't have a test for--

530
00:25:58,120 --> 00:26:01,680
what you said, I think you don't
have to go and test for

531
00:26:01,680 --> 00:26:04,750
every possible condition
in there.

532
00:26:04,750 --> 00:26:06,990
The negative side is, if
you miss something,

533
00:26:06,990 --> 00:26:07,940
we will deduct points.

534
00:26:07,940 --> 00:26:11,390
Positive side is, if your test
finds somebody else's bugs,

535
00:26:11,390 --> 00:26:12,480
you'll get points for it.

536
00:26:12,480 --> 00:26:15,150
We'll go into detail exactly
how [UNINTELLIGIBLE].

537
00:26:15,150 --> 00:26:18,430
Another thing we did last
year, which was kind of

538
00:26:18,430 --> 00:26:18,870
interesting--

539
00:26:18,870 --> 00:26:21,300
I think there were mixed--

540
00:26:21,300 --> 00:26:22,650
some people really liked
it, some people

541
00:26:22,650 --> 00:26:23,630
were not that happy--

542
00:26:23,630 --> 00:26:26,890
is a little bit of competitive
grading.

543
00:26:26,890 --> 00:26:29,420
So here, what we are doing
is not the full grade--

544
00:26:29,420 --> 00:26:30,340
small part of the grade.

545
00:26:30,340 --> 00:26:37,510
What you do is when you first
submit your beta, rerun

546
00:26:37,510 --> 00:26:40,110
performance tests
of everybody.

547
00:26:40,110 --> 00:26:42,010
And of course, one of you
are going to get the

548
00:26:42,010 --> 00:26:44,130
fastest code for that.

549
00:26:44,130 --> 00:26:45,850
At that point, we are going
to say, OK, this

550
00:26:45,850 --> 00:26:47,880
is the fastest code.

551
00:26:47,880 --> 00:26:49,090
And how [UNINTELLIGIBLE]

552
00:26:49,090 --> 00:26:51,870
everybody else with that code?

553
00:26:51,870 --> 00:26:57,030
And the person who gets fastest
code for that part of

554
00:26:57,030 --> 00:26:58,670
the grade gets the
full points.

555
00:26:58,670 --> 00:27:02,150
And everybody else will get a
partial part of the points.

556
00:27:02,150 --> 00:27:06,170
However, you get a chance in
your final submission to

557
00:27:06,170 --> 00:27:08,880
basically improve your code.

558
00:27:08,880 --> 00:27:11,610
If you do better than the last
time's best code, you won't

559
00:27:11,610 --> 00:27:13,250
get any more additional points,
because then we don't

560
00:27:13,250 --> 00:27:14,370
want to create
[? an arms race. ?]

561
00:27:14,370 --> 00:27:17,020
So you at least had to
match the previous

562
00:27:17,020 --> 00:27:19,970
times to get full points.

563
00:27:19,970 --> 00:27:23,870
So you get a chance to see how
you stand against your peers

564
00:27:23,870 --> 00:27:26,560
and get a chance to actually
[UNINTELLIGIBLE] math with

565
00:27:26,560 --> 00:27:27,030
your peers.

566
00:27:27,030 --> 00:27:32,220
So first time we did it, we
said, OK, it's the fraction of

567
00:27:32,220 --> 00:27:37,200
your performance against the
performance of the best code.

568
00:27:37,200 --> 00:27:40,680
The problem was, there were a
couple of codes that 1000x

569
00:27:40,680 --> 00:27:42,690
performance improvement,
and everybody got 10x.

570
00:27:42,690 --> 00:27:45,170
So you've got, like,
0.25 grade.

571
00:27:45,170 --> 00:27:46,060
That didn't help.

572
00:27:46,060 --> 00:27:49,140
So now we actually are doing
a logarithmic log scale.

573
00:27:49,140 --> 00:27:53,300
So we'll get the fraction, and
we'll take the log off that.

574
00:27:53,300 --> 00:27:56,080
So there will be a little
bit more equality.

575
00:27:56,080 --> 00:27:59,570
So you will know how worse off
you are, and you get a chance

576
00:27:59,570 --> 00:28:00,160
to [UNINTELLIGIBLE].

577
00:28:00,160 --> 00:28:02,570
Because it is very important,
because a lot of times, there

578
00:28:02,570 --> 00:28:05,480
might be a crucial thing
that you missed.

579
00:28:05,480 --> 00:28:08,780
And instead of trying to be,
"aha, got you" in this class,

580
00:28:08,780 --> 00:28:11,700
we are trying to give you a
chance to actually learn.

581
00:28:11,700 --> 00:28:14,010
So if you missed it, OK, you
missed it the first time.

582
00:28:14,010 --> 00:28:17,150
But you get the second chance
to come and fix it and see

583
00:28:17,150 --> 00:28:18,560
whether you can do well.

584
00:28:18,560 --> 00:28:21,130
And of course, if you do better
than the best grade,

585
00:28:21,130 --> 00:28:23,640
there are bragging rights, but
you don't get extra points.

586
00:28:23,640 --> 00:28:27,960

587
00:28:27,960 --> 00:28:31,870
With that, let me--

588
00:28:31,870 --> 00:28:33,120
where are we?

589
00:28:33,120 --> 00:28:36,610

590
00:28:36,610 --> 00:28:40,990
So again, as Charles pointed
out, I'm pretty excited about

591
00:28:40,990 --> 00:28:41,390
this class.

592
00:28:41,390 --> 00:28:43,130
This is a really fun class.

593
00:28:43,130 --> 00:28:44,230
Unfortunately, I'm a little
bit under the weather.

594
00:28:44,230 --> 00:28:46,030
So I'm going to sit down.

595
00:28:46,030 --> 00:28:52,230
And if I don't sound that
enthusiastic, hopefully I'll

596
00:28:52,230 --> 00:28:53,390
get enthusiastic later.

597
00:28:53,390 --> 00:28:56,730
And that's one reason we ask you
to come forward, because

598
00:28:56,730 --> 00:28:58,470
we don't have to come
here and shout.

599
00:28:58,470 --> 00:29:00,400
So hopefully people in
the back can here.

600
00:29:00,400 --> 00:29:02,890
If you don't, still, there are
some seats in the front.

601
00:29:02,890 --> 00:29:05,290
You can move forward.

602
00:29:05,290 --> 00:29:09,090
So for the rest of the class,
what we want to do is go

603
00:29:09,090 --> 00:29:13,180
through an example and look at
why performance matters.

604
00:29:13,180 --> 00:29:16,270

605
00:29:16,270 --> 00:29:17,900
For that, I selected
matrix multiply.

606
00:29:17,900 --> 00:29:20,890
And I think your Project 0,
you are going to also play

607
00:29:20,890 --> 00:29:23,010
with matrix multiply.

608
00:29:23,010 --> 00:29:26,730
And I think that you'll get a
feel for what kind of things

609
00:29:26,730 --> 00:29:31,730
will happened and how much
room you have in

610
00:29:31,730 --> 00:29:33,640
doing well in here.

611
00:29:33,640 --> 00:29:36,260
And at the same time, as I go
through this, you'll get a

612
00:29:36,260 --> 00:29:39,750
feel for what's the process that
you might be following

613
00:29:39,750 --> 00:29:41,750
through the class and what
kind of things--

614
00:29:41,750 --> 00:29:46,100
because linear programming,
after taking a bunch of

615
00:29:46,100 --> 00:29:48,780
programming classes, probably,
you have a methodology set up

616
00:29:48,780 --> 00:29:51,190
in your mind, how to go
about doing that.

617
00:29:51,190 --> 00:29:54,500
And the same thing after this
class, we want you to have a

618
00:29:54,500 --> 00:29:57,580
methodology of going through
and improving performance.

619
00:29:57,580 --> 00:30:02,590
And this is basically a little
bit of a contrite example, but

620
00:30:02,590 --> 00:30:04,120
you'll get a feel.

621
00:30:04,120 --> 00:30:06,270
So what's matrix multiply?

622
00:30:06,270 --> 00:30:10,000
It's a very fundamental
operation for many

623
00:30:10,000 --> 00:30:10,700
computations.

624
00:30:10,700 --> 00:30:13,940
So you are going to produce
Matrix A by multiplying Matrix

625
00:30:13,940 --> 00:30:18,410
B and C. So you want to
calculate each value in here

626
00:30:18,410 --> 00:30:22,330
to take a row of B and a column
of C, and you get each

627
00:30:22,330 --> 00:30:27,150
element of A. And you do that
for every element of A. And

628
00:30:27,150 --> 00:30:29,900
this is the possible simplest
code you can write in a

629
00:30:29,900 --> 00:30:32,150
language like C, a
[? three-loop nest, ?]

630
00:30:32,150 --> 00:30:34,980
and this matrix multiply.

631
00:30:34,980 --> 00:30:38,310
So, so far, things should
be pretty simple.

632
00:30:38,310 --> 00:30:42,030
So what I did was I looked at
matrix multiply and said,

633
00:30:42,030 --> 00:30:45,820
look, you have taken software
engineering class.

634
00:30:45,820 --> 00:30:48,660
You have thought about
object-oriented programming,

635
00:30:48,660 --> 00:30:51,160
lot of nice techniques
to build very

636
00:30:51,160 --> 00:30:52,720
useful, portable software.

637
00:30:52,720 --> 00:30:55,100
And let's try to write
matrix multiply with

638
00:30:55,100 --> 00:30:57,090
some of those concepts.

639
00:30:57,090 --> 00:31:00,310
So the first thing I want to
do is matrix multiply write

640
00:31:00,310 --> 00:31:04,640
object oriented using immutable
classes, and that,

641
00:31:04,640 --> 00:31:06,730
in fact, I want matrix class
to represent both

642
00:31:06,730 --> 00:31:08,470
integers and doubles.

643
00:31:08,470 --> 00:31:11,200
So here's my class
structuring here.

644
00:31:11,200 --> 00:31:14,680
And actually, I'm going to show
you a bunch of code now,

645
00:31:14,680 --> 00:31:15,490
next couple of slides.

646
00:31:15,490 --> 00:31:17,540
I'm not expecting you
to follow everything

647
00:31:17,540 --> 00:31:18,350
instantaneously.

648
00:31:18,350 --> 00:31:20,140
I'm going to go through
somewhat faster.

649
00:31:20,140 --> 00:31:21,570
But you have the slides.

650
00:31:21,570 --> 00:31:25,140
So I always expect you to,
perhaps, in some free time, to

651
00:31:25,140 --> 00:31:27,590
go and look at some of the
slides, get a little bit of

652
00:31:27,590 --> 00:31:28,990
better feel for what happened.

653
00:31:28,990 --> 00:31:32,590
So here's what matrix
multiply does.

654
00:31:32,590 --> 00:31:38,750
I have a value class that
figures out the matrix type.

655
00:31:38,750 --> 00:31:40,300
And if it's integer,
it keep integers.

656
00:31:40,300 --> 00:31:42,770
If it's a double,
I keep a double.

657
00:31:42,770 --> 00:31:46,190
And here, basically, do
this value in here.

658
00:31:46,190 --> 00:31:49,300
And then I have a
matrix class.

659
00:31:49,300 --> 00:31:53,820
Basically, given a matrix type
and a matrix, it can create a

660
00:31:53,820 --> 00:31:54,650
matrix here.

661
00:31:54,650 --> 00:32:01,930
This basically creates your
matrix in here by first

662
00:32:01,930 --> 00:32:06,050
creating a bunch of rows here,
and then instantiating each

663
00:32:06,050 --> 00:32:09,280
row in here, whether it's
integer or double, depending

664
00:32:09,280 --> 00:32:11,340
on the type you want.

665
00:32:11,340 --> 00:32:13,180
And then, of course, there are
other things in matrix,

666
00:32:13,180 --> 00:32:15,590
basically, in here.

667
00:32:15,590 --> 00:32:17,210
And you can update a matrix.

668
00:32:17,210 --> 00:32:18,590
This is going to be a little
bit interesting.

669
00:32:18,590 --> 00:32:21,290
I will give a little bit here,
because what I'm going is I'm

670
00:32:21,290 --> 00:32:22,380
doing immutable types.

671
00:32:22,380 --> 00:32:24,390
That means I can't change
anything in the matrix.

672
00:32:24,390 --> 00:32:27,780
If you are updating, you have
to create a new matrix.

673
00:32:27,780 --> 00:32:29,140
So I'll show a little
bit about how to

674
00:32:29,140 --> 00:32:30,040
go about doing that.

675
00:32:30,040 --> 00:32:33,060
And then there's
this row class.

676
00:32:33,060 --> 00:32:34,590
I'm going through this
fast, because

677
00:32:34,590 --> 00:32:35,840
it's not that important.

678
00:32:35,840 --> 00:32:39,820

679
00:32:39,820 --> 00:32:42,730
If you get a chance, go through
this and get a feel

680
00:32:42,730 --> 00:32:43,710
for what I'm doing.

681
00:32:43,710 --> 00:32:48,565
But the idea is doing this very
software engineering way

682
00:32:48,565 --> 00:32:51,270
of implementing a
matrix multiply.

683
00:32:51,270 --> 00:32:53,880
So then here's my matrix
multiply in here, that what

684
00:32:53,880 --> 00:32:56,550
I'm doing is I'm updating each
of these matrices by giving a

685
00:32:56,550 --> 00:33:01,340
value of B and C and
adding it to A and

686
00:33:01,340 --> 00:33:03,960
updating A again in here.

687
00:33:03,960 --> 00:33:09,370
So when I run matrix multiply,
1,024 by 1,024 matrix, this is

688
00:33:09,370 --> 00:33:10,606
the time I got.

689
00:33:10,606 --> 00:33:13,520
Actually, it took a long
time, in fact.

690
00:33:13,520 --> 00:33:17,410
And the key thing, is the
performance good?

691
00:33:17,410 --> 00:33:19,240
That's the first question
to ask.

692
00:33:19,240 --> 00:33:22,190
Because you run, you
get a result.

693
00:33:22,190 --> 00:33:23,240
When the result is correct,
OK, great.

694
00:33:23,240 --> 00:33:24,410
You get the correct result.

695
00:33:24,410 --> 00:33:25,730
But how do you do?

696
00:33:25,730 --> 00:33:28,470
And this is sometimes a very
hard thing, to know that you

697
00:33:28,470 --> 00:33:32,630
have a problem is itself
a huge win--

698
00:33:32,630 --> 00:33:33,760
that you figure out you
have a problem.

699
00:33:33,760 --> 00:33:36,660
And what I'm going to do is go
through a little bit of a

700
00:33:36,660 --> 00:33:39,510
calculation to see
how we did it.

701
00:33:39,510 --> 00:33:43,200
It took about five hours to do
this matrix multiply, and that

702
00:33:43,200 --> 00:33:45,380
doesn't look right already.

703
00:33:45,380 --> 00:33:47,570
So if you think about matrix
multiply, it's an N-cube

704
00:33:47,570 --> 00:33:51,640
operation, basically, what
matrix multiply is doing.

705
00:33:51,640 --> 00:33:53,150
That's the algorithmic
[? class. ?]

706
00:33:53,150 --> 00:33:56,330
If you look at Charles' book,
that's what it would say.

707
00:33:56,330 --> 00:33:57,230
It's N-cube operation.

708
00:33:57,230 --> 00:34:00,700
So this is the number of
operations you want to do.

709
00:34:00,700 --> 00:34:04,820
And for each matrix operation,
you only do three index

710
00:34:04,820 --> 00:34:09,710
updaters, add, a multiply,
and a check--

711
00:34:09,710 --> 00:34:12,310
probably six ops to
do, basically.

712
00:34:12,310 --> 00:34:15,929
And that means I need to do
this many operations.

713
00:34:15,929 --> 00:34:19,670
Basically, you can think about
machine operations to do,

714
00:34:19,670 --> 00:34:23,659
basically, because you have to
read three values, update one,

715
00:34:23,659 --> 00:34:28,260
do a multiply and add,
and do a branch.

716
00:34:28,260 --> 00:34:31,949
And if I look at that with my
hours, I am doing this many

717
00:34:31,949 --> 00:34:33,929
operations per second,
which looks high.

718
00:34:33,929 --> 00:34:36,920
Big number.

719
00:34:36,920 --> 00:34:41,310
But my machine, actually, on my
somewhat old PC, transfers

720
00:34:41,310 --> 00:34:42,239
even much faster.

721
00:34:42,239 --> 00:34:48,500
So if you think about it, I am
taking about 8,300 cycles to

722
00:34:48,500 --> 00:34:50,570
do any kind of a machine
operation.

723
00:34:50,570 --> 00:34:51,880
This doesn't look right.

724
00:34:51,880 --> 00:34:55,110

725
00:34:55,110 --> 00:34:58,620
So at this point, this is what
I would call a "backup

726
00:34:58,620 --> 00:35:02,100
[UNINTELLIGIBLE] calculation."
I haven't proven anything.

727
00:35:02,100 --> 00:35:05,180
This one worked well in the
theory class, but if you can

728
00:35:05,180 --> 00:35:07,845
just jot down a couple of these
things, you can get a

729
00:35:07,845 --> 00:35:10,740
feel for where you
stand very fast.

730
00:35:10,740 --> 00:35:13,530
And then you realize, OK, I
am a couple of orders of

731
00:35:13,530 --> 00:35:15,870
magnitude off here on what
best I could do.

732
00:35:15,870 --> 00:35:18,630
And that should give you a
feel for what's going on.

733
00:35:18,630 --> 00:35:21,250
OK, so what's going on here?

734
00:35:21,250 --> 00:35:22,410
How can we improve
performance?

735
00:35:22,410 --> 00:35:24,510
Because we know we
have a problem.

736
00:35:24,510 --> 00:35:26,850
So what we can do is
we can look deeply

737
00:35:26,850 --> 00:35:28,370
into the program execution.

738
00:35:28,370 --> 00:35:30,570
And at some points, OK, this
program runs slow.

739
00:35:30,570 --> 00:35:31,070
[UNINTELLIGIBLE]

740
00:35:31,070 --> 00:35:33,190
time spent.

741
00:35:33,190 --> 00:35:37,270
So we can figure out time spent,
basically, things like

742
00:35:37,270 --> 00:35:38,580
by method--

743
00:35:38,580 --> 00:35:41,300
which method you spend most
of the time running.

744
00:35:41,300 --> 00:35:42,740
Sometimes by line.

745
00:35:42,740 --> 00:35:45,790
And we will learn a bunch of
profiling tools that will give

746
00:35:45,790 --> 00:35:46,580
you this feel.

747
00:35:46,580 --> 00:35:48,780
So instead of reading a million
lines of code, looking

748
00:35:48,780 --> 00:35:52,590
at every line in there, you can
very fast converge onto

749
00:35:52,590 --> 00:35:55,120
the culprits where most of
the time being spent.

750
00:35:55,120 --> 00:35:58,640
And that's a very easy way
of triaging your problem.

751
00:35:58,640 --> 00:36:03,620
And you look at things like time
spent, cumulative time

752
00:36:03,620 --> 00:36:06,440
spent within that and everything
called underneath,

753
00:36:06,440 --> 00:36:08,540
number of invocations
of the functions.

754
00:36:08,540 --> 00:36:12,130
And a lot times, you have a feel
for what it should be.

755
00:36:12,130 --> 00:36:14,720
And when you look at this, if
it doesn't look right, you

756
00:36:14,720 --> 00:36:17,030
say, wait a minute, this goes
against my intuition.

757
00:36:17,030 --> 00:36:19,320
Perhaps there's a problem.

758
00:36:19,320 --> 00:36:23,690
So what you can do is, by
doing this, you can

759
00:36:23,690 --> 00:36:24,930
identify hot spots.

760
00:36:24,930 --> 00:36:27,910
Basically, if 90% of the time
it's it one routine, that's

761
00:36:27,910 --> 00:36:29,870
what you want to fix.

762
00:36:29,870 --> 00:36:34,220
And hopefully, that routine is
doing something too slow.

763
00:36:34,220 --> 00:36:38,080
Then you can actually get good
performance in here.

764
00:36:38,080 --> 00:36:43,040
So here's the profile I ran
for a matrix multiply.

765
00:36:43,040 --> 00:36:46,040
And interestingly, what I found
was, most of the time

766
00:36:46,040 --> 00:36:50,000
spent in basically creating
doubles--

767
00:36:50,000 --> 00:36:55,430
number of calls, time spent
here, this much time was

768
00:36:55,430 --> 00:37:02,710
spending, basically creating
new values in here, this

769
00:37:02,710 --> 00:37:03,960
initializing a double in here.

770
00:37:03,960 --> 00:37:07,940

771
00:37:07,940 --> 00:37:09,110
So I'm doing matrix multiply.

772
00:37:09,110 --> 00:37:11,360
Why should I be initializing
values?

773
00:37:11,360 --> 00:37:12,510
So this is an issue.

774
00:37:12,510 --> 00:37:16,290
And I have the most number
of calls to that too.

775
00:37:16,290 --> 00:37:18,530
So we can go look at it and say,
hey, that would seem to

776
00:37:18,530 --> 00:37:19,960
be a problem here.

777
00:37:19,960 --> 00:37:22,580
So the problem here was
kind of obvious.

778
00:37:22,580 --> 00:37:25,350
What happened was I didn't use
immutable [? class. ?]

779
00:37:25,350 --> 00:37:27,090
So this is my matrix
representation.

780
00:37:27,090 --> 00:37:31,360
I have a bunch of pointers
pointing to each row.

781
00:37:31,360 --> 00:37:34,040
And now assume I won't
update this element.

782
00:37:34,040 --> 00:37:36,320
And of course, I can't modify
the entire thing.

783
00:37:36,320 --> 00:37:37,120
It's immutable.

784
00:37:37,120 --> 00:37:41,490
So what I had to do is create
a copy that updates.

785
00:37:41,490 --> 00:37:43,580
And then, of course, I had
to create a new matrix.

786
00:37:43,580 --> 00:37:46,600
I don't have to copy everybody
else, because these guys were

787
00:37:46,600 --> 00:37:47,350
not changed.

788
00:37:47,350 --> 00:37:50,920
So I can place a pointer to
these things and create this

789
00:37:50,920 --> 00:37:54,140
new matrix that points
to these things.

790
00:37:54,140 --> 00:37:56,870
And then, if you think about
what's going on, in fact, and

791
00:37:56,870 --> 00:38:00,705
this is my new matrix after I
do that, I [? updated ?] two

792
00:38:00,705 --> 00:38:03,625
end copies for each update,
because what I had to do is I

793
00:38:03,625 --> 00:38:06,280
had to copy this entire
row up to get that.

794
00:38:06,280 --> 00:38:08,750
And I had to create this entire
index again for each

795
00:38:08,750 --> 00:38:09,930
point in each row.

796
00:38:09,930 --> 00:38:12,890
So just to update one element,
I am creating two end copies.

797
00:38:12,890 --> 00:38:16,300
So what that means is now our N
cube matrix multiply ends up

798
00:38:16,300 --> 00:38:19,240
being N to the power
of 4 algorithm.

799
00:38:19,240 --> 00:38:21,290
So suddenly, you realize, wait
a minute, I'm doing something

800
00:38:21,290 --> 00:38:21,600
bad in here.

801
00:38:21,600 --> 00:38:22,340
I'm actually [UNINTELLIGIBLE]

802
00:38:22,340 --> 00:38:25,900
N-cube algorithm, and the way
I saw, it becomes n to the

803
00:38:25,900 --> 00:38:28,600
power of 4 because I'm making
all these copies in there.

804
00:38:28,600 --> 00:38:32,290
So this is a bad thing
to do, of course.

805
00:38:32,290 --> 00:38:34,700
And copying is very costly,
because you're making

806
00:38:34,700 --> 00:38:36,060
duplicates in here.

807
00:38:36,060 --> 00:38:38,430
And of course, you're creating
a huge amount of garbage,

808
00:38:38,430 --> 00:38:41,660
because you are just copying and
leaving the previous value

809
00:38:41,660 --> 00:38:42,570
for garbage collection.

810
00:38:42,570 --> 00:38:45,480
So garbage collector's coming
up all the time, and huge

811
00:38:45,480 --> 00:38:46,180
memory footprint.

812
00:38:46,180 --> 00:38:48,280
So this is no good.

813
00:38:48,280 --> 00:38:50,220
And can we do better?

814
00:38:50,220 --> 00:38:51,590
Hopefully, we can.

815
00:38:51,590 --> 00:38:53,890
So the next thing we did was,
OK, just get rid of these

816
00:38:53,890 --> 00:38:55,850
immutable types.

817
00:38:55,850 --> 00:38:58,410
That seemed to be a
[UNINTELLIGIBLE]

818
00:38:58,410 --> 00:38:59,460
here.

819
00:38:59,460 --> 00:39:01,570
And then, I actually have--

820
00:39:01,570 --> 00:39:03,610
I will go faster over the
code, so if you are

821
00:39:03,610 --> 00:39:06,330
interested, you can go look
at the code in here.

822
00:39:06,330 --> 00:39:12,420
So here, what we did was, still
we have the issues like

823
00:39:12,420 --> 00:39:15,330
object oriented and ability
to represent both

824
00:39:15,330 --> 00:39:16,850
integers and doubles.

825
00:39:16,850 --> 00:39:19,710
So what that means is each
matrix can be the integer

826
00:39:19,710 --> 00:39:21,360
matrix or a double matrix.

827
00:39:21,360 --> 00:39:24,430
So this is where you're
instantiating with integers or

828
00:39:24,430 --> 00:39:26,890
doubles in here.

829
00:39:26,890 --> 00:39:28,780
So we do that.

830
00:39:28,780 --> 00:39:31,670
And how much you think this
can improve performance by

831
00:39:31,670 --> 00:39:33,070
doing that?

832
00:39:33,070 --> 00:39:34,680
In fact, this was pretty nice.

833
00:39:34,680 --> 00:39:37,810
I actually got a 219x
performance improvement

834
00:39:37,810 --> 00:39:40,560
because I went from n to the
power 4 algorithm to n to the

835
00:39:40,560 --> 00:39:41,590
power 3 algorithm.

836
00:39:41,590 --> 00:39:44,450
So here is interesting,
algorithm has changed with it,

837
00:39:44,450 --> 00:39:48,810
and we have a pretty nice
performance improvement.

838
00:39:48,810 --> 00:39:53,210
So if I look at this now, what
I find is I'm spending a lot

839
00:39:53,210 --> 00:39:56,750
of method time, I'm doing
matrix multiply.

840
00:39:56,750 --> 00:39:57,780
That's good.

841
00:39:57,780 --> 00:39:59,890
But I'm spending all
this time doing get

842
00:39:59,890 --> 00:40:02,500
double and get methods.

843
00:40:02,500 --> 00:40:03,850
I'm trying to do matrix
multiply.

844
00:40:03,850 --> 00:40:05,730
Why am I spending all this
time trying to do

845
00:40:05,730 --> 00:40:08,630
get double and get?

846
00:40:08,630 --> 00:40:13,360
What I found was issue is the
method call overhead.

847
00:40:13,360 --> 00:40:15,740
Now, what happened is matrix row
has two different matrix

848
00:40:15,740 --> 00:40:17,690
types, the integer row
and double row.

849
00:40:17,690 --> 00:40:22,870
So every time I want to get a
row, I don't know if I have

850
00:40:22,870 --> 00:40:24,030
integer row or double row.

851
00:40:24,030 --> 00:40:26,800
I have to do a look-up to
figure out my type.

852
00:40:26,800 --> 00:40:30,480
Am I working with the integer
matrix or matrix in doubles?

853
00:40:30,480 --> 00:40:33,760
And this is what we call
"dynamic dispatch." So instead

854
00:40:33,760 --> 00:40:36,460
of knowing exactly the methods
we are to go, I had to

855
00:40:36,460 --> 00:40:38,380
actually look up, OK, which
method I'm supposed to go.

856
00:40:38,380 --> 00:40:41,580
I have to do that since, if the
matrix is integer, I might

857
00:40:41,580 --> 00:40:44,550
have to call a different method
than matrix is double.

858
00:40:44,550 --> 00:40:45,480
I have to check that.

859
00:40:45,480 --> 00:40:47,890
And then that call, instead
of direct call,

860
00:40:47,890 --> 00:40:49,320
becomes indirect branch.

861
00:40:49,320 --> 00:40:52,010
I have to check that and
basically branch on that.

862
00:40:52,010 --> 00:40:54,290
And these indirect branches are
costly, because what that

863
00:40:54,290 --> 00:40:58,260
means is I don't where I am
going until I do that test.

864
00:40:58,260 --> 00:41:00,250
So no machine--

865
00:41:00,250 --> 00:41:02,490
we'll go into details
and pipelines and

866
00:41:02,490 --> 00:41:03,780
stuff as we go on.

867
00:41:03,780 --> 00:41:06,160
What that means is normally,
a machine can have a lot of

868
00:41:06,160 --> 00:41:07,600
instruction in flight.

869
00:41:07,600 --> 00:41:11,130
So that means that I have to
know a lot more information to

870
00:41:11,130 --> 00:41:12,560
get the instructions going.

871
00:41:12,560 --> 00:41:15,660
But here, until you resolve the
instructions, you can't do

872
00:41:15,660 --> 00:41:16,580
the next one.

873
00:41:16,580 --> 00:41:18,870
So it's called a, basically,
pipeline stall.

874
00:41:18,870 --> 00:41:21,150
So instead of having hundreds
of instructions going,

875
00:41:21,150 --> 00:41:23,290
basically, suddenly I can't do
anything until the previous

876
00:41:23,290 --> 00:41:24,720
instruction is done.

877
00:41:24,720 --> 00:41:29,850
So the machine slows down
a lot because of this.

878
00:41:29,850 --> 00:41:33,210

879
00:41:33,210 --> 00:41:34,700
Normally, what you can do
is keep fetching next

880
00:41:34,700 --> 00:41:35,440
instructions here.

881
00:41:35,440 --> 00:41:38,730
We have to fetch, test, then the
test result is available,

882
00:41:38,730 --> 00:41:41,476
then that's when you can fetch
the next instruction.

883
00:41:41,476 --> 00:41:44,860

884
00:41:44,860 --> 00:41:45,940
Let me go through that file.

885
00:41:45,940 --> 00:41:48,800
So direct batch means target
address is known.

886
00:41:48,800 --> 00:41:50,970
You can fetch ahead of the
targets/ If I know where I am

887
00:41:50,970 --> 00:41:53,660
going, even if I don't get
there, I can fetch ahead of

888
00:41:53,660 --> 00:41:54,070
the target.

889
00:41:54,070 --> 00:41:58,970
I can go start fetching what
happens next ahead, basically,

890
00:41:58,970 --> 00:42:00,650
into [? the branch. ?]

891
00:42:00,650 --> 00:42:03,540
Sometimes, you can even fetch
both directions, or the

892
00:42:03,540 --> 00:42:05,520
direction you might always
go might get

893
00:42:05,520 --> 00:42:06,380
fetched by the architect.

894
00:42:06,380 --> 00:42:09,060
We'll talk about a lot of
architecture details in the

895
00:42:09,060 --> 00:42:09,970
future lectures.

896
00:42:09,970 --> 00:42:12,150
Indirect branch means until I
calculate the address, I have

897
00:42:12,150 --> 00:42:13,850
no idea where I'm going.

898
00:42:13,850 --> 00:42:15,490
That address is only
calculated.

899
00:42:15,490 --> 00:42:20,720
And because of that, hardware
slows down like crazy.

900
00:42:20,720 --> 00:42:25,280
So this seems to be not a good
thing to have these integers

901
00:42:25,280 --> 00:42:28,310
and doubles instead of having
one nice class of just

902
00:42:28,310 --> 00:42:29,360
[UNINTELLIGIBLE]

903
00:42:29,360 --> 00:42:30,330
might be nice.

904
00:42:30,330 --> 00:42:32,580
I just basically create
a double class.

905
00:42:32,580 --> 00:42:34,730
It's all matrices are doubles.

906
00:42:34,730 --> 00:42:37,310
If I want integers, I will just
rewrite my matrix class

907
00:42:37,310 --> 00:42:39,840
or copy and replace.

908
00:42:39,840 --> 00:42:43,670
Now my matrix class becomes
much more simpler.

909
00:42:43,670 --> 00:42:47,310
This is what I got, and this
is my row in here.

910
00:42:47,310 --> 00:42:52,220
And voila, minute I do that, I
basically went up about a 2.4x

911
00:42:52,220 --> 00:42:54,890
performance improvement, because
now I don't have to

912
00:42:54,890 --> 00:42:56,510
keep doing that.

913
00:42:56,510 --> 00:42:58,610
And altogether now, I have
improved my performance by

914
00:42:58,610 --> 00:43:00,850
about 500x here.

915
00:43:00,850 --> 00:43:03,270
And for ops per cycle, we
have gone down from

916
00:43:03,270 --> 00:43:07,186
8,000 to 38 to 16.

917
00:43:07,186 --> 00:43:09,020
So we do still better.

918
00:43:09,020 --> 00:43:12,880
And now we look at, again,
my profile data.

919
00:43:12,880 --> 00:43:16,690
And what I see is, OK, I am
spending a lot of time in

920
00:43:16,690 --> 00:43:17,060
matrix multiply.

921
00:43:17,060 --> 00:43:18,310
That's OK.

922
00:43:18,310 --> 00:43:20,760
But also, I am spending time
in this matrix get method.

923
00:43:20,760 --> 00:43:26,410
I am trying to get each,
basically, row and data

924
00:43:26,410 --> 00:43:27,660
elements of the matrix.

925
00:43:27,660 --> 00:43:32,480

926
00:43:32,480 --> 00:43:35,280
So here is the summary
of these things.

927
00:43:35,280 --> 00:43:37,000
I'm going to just keep that
one, because I gave you

928
00:43:37,000 --> 00:43:38,370
[INAUDIBLE].

929
00:43:38,370 --> 00:43:41,500
So the problem in this
object-orientedness is I

930
00:43:41,500 --> 00:43:43,560
[? handle ?] each row is
represented separately--

931
00:43:43,560 --> 00:43:45,380
nice objects.

932
00:43:45,380 --> 00:43:47,300
These objects overload
the memory.

933
00:43:47,300 --> 00:43:50,130
And in fact, if I want to get to
my row, I have to appoint a

934
00:43:50,130 --> 00:43:50,650
[? chase ?]

935
00:43:50,650 --> 00:43:51,610
to go get that.

936
00:43:51,610 --> 00:43:53,380
It's not contiguous in memory.

937
00:43:53,380 --> 00:43:55,970
And to get to every row, I have
to actually go through

938
00:43:55,970 --> 00:43:58,270
this [? point ?] indexing
in here.

939
00:43:58,270 --> 00:44:00,990
[UNINTELLIGIBLE] matrices are
one, nice, simple objects.

940
00:44:00,990 --> 00:44:04,170
So we can say, OK,
wait a minute.

941
00:44:04,170 --> 00:44:06,470
This method call overhead
actually keeps dominating.

942
00:44:06,470 --> 00:44:09,195
And because I have done this
nice object-oriented thing in

943
00:44:09,195 --> 00:44:14,270
here, so I can get to the
object-oriented file here and

944
00:44:14,270 --> 00:44:15,800
say, OK, get to the objects.

945
00:44:15,800 --> 00:44:23,380
And I write this nice loop
of [UNINTELLIGIBLE].

946
00:44:23,380 --> 00:44:28,210
And I have two-dimensional
matrices in here.

947
00:44:28,210 --> 00:44:30,670
And voila, I got another
2.2x performance

948
00:44:30,670 --> 00:44:32,230
improvement in here.

949
00:44:32,230 --> 00:44:35,400
OK, now I am down to about
seven cycles for each

950
00:44:35,400 --> 00:44:36,510
operation in here.

951
00:44:36,510 --> 00:44:38,300
That's good.

952
00:44:38,300 --> 00:44:39,550
So what can we do here?

953
00:44:39,550 --> 00:44:42,750
So we did all of these things in
Java, because it's supposed

954
00:44:42,750 --> 00:44:44,030
to be a nice language.

955
00:44:44,030 --> 00:44:45,510
But Java has a lot
of overhead.

956
00:44:45,510 --> 00:44:47,900
It's like you do memory-bound
checks.

957
00:44:47,900 --> 00:44:49,110
You have to do this bytecode
interpreter

958
00:44:49,110 --> 00:44:50,700
and stuff like that.

959
00:44:50,700 --> 00:44:54,730
On the other hand, C can
run much faster.

960
00:44:54,730 --> 00:44:56,170
You don't have to do any kind
of memory-bound checks.

961
00:44:56,170 --> 00:44:59,740
The compilation happens before
you run, so it is not part of

962
00:44:59,740 --> 00:45:02,430
your cost of doing anything.

963
00:45:02,430 --> 00:45:05,710
So this can work very much like
C. You can just almost

964
00:45:05,710 --> 00:45:06,690
cut and paste.

965
00:45:06,690 --> 00:45:08,260
And if you know, C is
[? index chain, ?]

966
00:45:08,260 --> 00:45:08,510
[? and this ?]

967
00:45:08,510 --> 00:45:10,560
to C is just changing
a few lines.

968
00:45:10,560 --> 00:45:15,320
And I did that, and this is,
basically, the C code in here.

969
00:45:15,320 --> 00:45:16,580
This is the matrix multiply.

970
00:45:16,580 --> 00:45:18,055
This is the kind of allocation
in here.

971
00:45:18,055 --> 00:45:19,990
C can be a little bit
painful to allocate.

972
00:45:19,990 --> 00:45:21,360
This is my matrix multiply.

973
00:45:21,360 --> 00:45:23,232
It will look similar.

974
00:45:23,232 --> 00:45:27,900
And I got 2.1x just by going
from Java to C. This, I was

975
00:45:27,900 --> 00:45:30,280
surprised, because that piece of
Java code looked very much

976
00:45:30,280 --> 00:45:33,700
close to C. But by just doing
that, I actually got another

977
00:45:33,700 --> 00:45:35,320
2.1x performance gain.

978
00:45:35,320 --> 00:45:39,460

979
00:45:39,460 --> 00:45:41,545
Now this is kind of the
high-level stuff.

980
00:45:41,545 --> 00:45:42,930
Now we have to go
to nitty-gritty

981
00:45:42,930 --> 00:45:44,940
details in the hardware.

982
00:45:44,940 --> 00:45:47,890
So in modern hardware, there
are these things called

983
00:45:47,890 --> 00:45:52,130
"performance counters" that
hardware [UNINTELLIGIBLE]

984
00:45:52,130 --> 00:45:54,640
that we can look at what's
happening inside hardware.

985
00:45:54,640 --> 00:45:58,910
We can get a feel for what
happens inside hardware.

986
00:45:58,910 --> 00:46:03,010
And there are things called CPI,
which clock cycles per

987
00:46:03,010 --> 00:46:04,060
instruction.

988
00:46:04,060 --> 00:46:06,590
So that means if the
instructions are running very

989
00:46:06,590 --> 00:46:09,420
slow, you will have multiple
clock cycles per instructions.

990
00:46:09,420 --> 00:46:10,690
That means instructions
are stalling.

991
00:46:10,690 --> 00:46:12,400
Something is wrong.

992
00:46:12,400 --> 00:46:13,820
Cache misses.

993
00:46:13,820 --> 00:46:16,150
So data [UNINTELLIGIBLE]

994
00:46:16,150 --> 00:46:16,690
[? pay attention to ?]

995
00:46:16,690 --> 00:46:17,670
caches and stuff like that.

996
00:46:17,670 --> 00:46:20,870
If data are supposed to be in
cache and have cache hits all

997
00:46:20,870 --> 00:46:22,490
the time, but if you are
getting cache misses,

998
00:46:22,490 --> 00:46:23,490
something is wrong.

999
00:46:23,490 --> 00:46:24,920
Accesses are not good.

1000
00:46:24,920 --> 00:46:27,170
And then also the instructions
retired.

1001
00:46:27,170 --> 00:46:29,400
That means, how many
instructions are being

1002
00:46:29,400 --> 00:46:30,860
executed and retired?

1003
00:46:30,860 --> 00:46:34,630
If you are doing too much
work, the number of

1004
00:46:34,630 --> 00:46:37,820
instructions retired will
go high in here.

1005
00:46:37,820 --> 00:46:39,270
This will give you a few how
many [? work been done. ?]

1006
00:46:39,270 --> 00:46:42,090

1007
00:46:42,090 --> 00:46:46,160
So if you look at this one,
I have a CPI of 4.

1008
00:46:46,160 --> 00:46:51,350
That means for every
instruction, need about almost

1009
00:46:51,350 --> 00:46:53,710
close to five clock cycles,
which is not good.

1010
00:46:53,710 --> 00:46:55,620
I have pretty high cache
miss rate, too--

1011
00:46:55,620 --> 00:46:59,140
L1 cache miss rate and some
L2 cache miss rate.

1012
00:46:59,140 --> 00:47:01,740
This shows how many instruction
are in what they

1013
00:47:01,740 --> 00:47:04,360
call [? vector in the ?]
instruction form called SSE

1014
00:47:04,360 --> 00:47:05,360
instructions.

1015
00:47:05,360 --> 00:47:07,540
And here's the number of
instructions got retired.

1016
00:47:07,540 --> 00:47:11,240
So we just use that
baseline case.

1017
00:47:11,240 --> 00:47:14,750
So one thing I can look at was
my cache miss rates seem to be

1018
00:47:14,750 --> 00:47:16,130
pretty darn high.

1019
00:47:16,130 --> 00:47:17,940
What might be happening here?

1020
00:47:17,940 --> 00:47:19,920
So if you look at matrix
multiply--

1021
00:47:19,920 --> 00:47:23,480
so I have Matrix A in
here and Matrix B--

1022
00:47:23,480 --> 00:47:26,080
what happens is, to calculate
this value in [? Matrix A ?]

1023
00:47:26,080 --> 00:47:32,540
here, basically, I'm going to
go through this row in here,

1024
00:47:32,540 --> 00:47:34,540
and I'm going through
column in here.

1025
00:47:34,540 --> 00:47:36,500
Problem with the column is
it's all [? low in ?]

1026
00:47:36,500 --> 00:47:37,990
the cache.

1027
00:47:37,990 --> 00:47:39,860
This is your linear memory.

1028
00:47:39,860 --> 00:47:44,010
So because this is here two
dimensional, but inside your

1029
00:47:44,010 --> 00:47:45,550
hardware, there's no
two-dimensional memory.

1030
00:47:45,550 --> 00:47:47,810
You have a linear memory, so
this is your linear memory.

1031
00:47:47,810 --> 00:47:49,190
This is how the memory
look like.

1032
00:47:49,190 --> 00:47:50,530
And this is how the
memory look like.

1033
00:47:50,530 --> 00:47:52,520
So each row and column
is in these areas.

1034
00:47:52,520 --> 00:47:55,740
But here, to calculate this
value, this is nice.

1035
00:47:55,740 --> 00:47:58,860
I go through memory that
is contiguous.

1036
00:47:58,860 --> 00:48:02,120
But C, I am jumping all
over my memory.

1037
00:48:02,120 --> 00:48:08,230
And jumping on memory is not
that great cache behavior.

1038
00:48:08,230 --> 00:48:12,280
And contiguous can do much
better, because what you do

1039
00:48:12,280 --> 00:48:15,150
is, when you get a cache, you
get a cache line that has more

1040
00:48:15,150 --> 00:48:16,110
than one word.

1041
00:48:16,110 --> 00:48:18,090
So in here, what you're doing is
you are getting large cache

1042
00:48:18,090 --> 00:48:21,130
lines only using one word before
you go to the next one.

1043
00:48:21,130 --> 00:48:23,550
So that's not great
in cache behavior.

1044
00:48:23,550 --> 00:48:25,710
So what you can do is you can
make this memory contiguous.

1045
00:48:25,710 --> 00:48:29,650

1046
00:48:29,650 --> 00:48:32,765
So what you can do is, in matrix
multiply, normally, n

1047
00:48:32,765 --> 00:48:35,390
to the cubed computation
to the squared data.

1048
00:48:35,390 --> 00:48:39,530
So to help on n to the cubed,
it's OK to do some n to the

1049
00:48:39,530 --> 00:48:40,350
squared processing.

1050
00:48:40,350 --> 00:48:44,130
That means you can do a data
transport before you start

1051
00:48:44,130 --> 00:48:45,140
matrix multiply.

1052
00:48:45,140 --> 00:48:46,690
Do some pre-processing.

1053
00:48:46,690 --> 00:48:48,080
So this will also
come in handy.

1054
00:48:48,080 --> 00:48:50,390
A lot of these problems you'll
do, you'll say, wait a minute,

1055
00:48:50,390 --> 00:48:53,750
can I do a pre-process that's
cheaper that will really

1056
00:48:53,750 --> 00:48:55,870
improve my main processing?

1057
00:48:55,870 --> 00:48:58,850
This is a theme that comes
again and again.

1058
00:48:58,850 --> 00:49:01,410
And then we can get this n to
the cubed running faster since

1059
00:49:01,410 --> 00:49:03,820
this n to the cubed is much
larger than n to the squared.

1060
00:49:03,820 --> 00:49:06,980
You can amortize the
cost of doing that.

1061
00:49:06,980 --> 00:49:10,970
And basically, to get that, we
can transpose some matrix

1062
00:49:10,970 --> 00:49:12,270
going into the squared
operations.

1063
00:49:12,270 --> 00:49:15,530
And then now n to the cubed
might run much faster.

1064
00:49:15,530 --> 00:49:18,160
And in fact, what I have done
here is do the transpose of

1065
00:49:18,160 --> 00:49:18,850
this matrix.

1066
00:49:18,850 --> 00:49:21,280
And now we run this one, and
then everything will be in

1067
00:49:21,280 --> 00:49:23,040
contiguous memory.

1068
00:49:23,040 --> 00:49:26,290
And voila, I got another 3.4x
performance improvement.

1069
00:49:26,290 --> 00:49:29,440
Now I am running almost
at one cycle per op,

1070
00:49:29,440 --> 00:49:30,760
which is really nice.

1071
00:49:30,760 --> 00:49:33,295
But modern superscalars actually
can do better, so we

1072
00:49:33,295 --> 00:49:35,100
can actually end up
being even better.

1073
00:49:35,100 --> 00:49:35,890
So let's see.

1074
00:49:35,890 --> 00:49:37,790
So we are doing that.

1075
00:49:37,790 --> 00:49:39,030
So here is the interesting
thing.

1076
00:49:39,030 --> 00:49:42,540
Now I look at my transpose
[? spot. ?]

1077
00:49:42,540 --> 00:49:45,120
So if you look at that number of
retired instructions, it's

1078
00:49:45,120 --> 00:49:47,450
more or less the same.

1079
00:49:47,450 --> 00:49:50,590
Number of SSE instructions,
more or less the same.

1080
00:49:50,590 --> 00:49:55,960
I have a 2x improvement in
my L1 cache miss rate.

1081
00:49:55,960 --> 00:50:00,090
But more than that, I have a
5x improvement in the CPI.

1082
00:50:00,090 --> 00:50:03,550
That means my instructions
are stalling a lot

1083
00:50:03,550 --> 00:50:06,430
less now than before.

1084
00:50:06,430 --> 00:50:08,330
So this is where I got all my
performance [UNINTELLIGIBLE],

1085
00:50:08,330 --> 00:50:11,220
because my instructions are
running much faster.

1086
00:50:11,220 --> 00:50:13,260
So by looking at data--

1087
00:50:13,260 --> 00:50:14,090
this is the kind of
things you're

1088
00:50:14,090 --> 00:50:15,850
learning during the class--

1089
00:50:15,850 --> 00:50:18,520
then you can get a feel for what
actually is happening and

1090
00:50:18,520 --> 00:50:21,880
why you are getting that.

1091
00:50:21,880 --> 00:50:24,730
You can go do more in the
cache, in memory system.

1092
00:50:24,730 --> 00:50:27,640
So remember, the memory
system, if you're

1093
00:50:27,640 --> 00:50:30,190
[UNINTELLIGIBLE], if you look
at cache, what they're doing

1094
00:50:30,190 --> 00:50:33,130
is we have a small amount of
memory called cache that are

1095
00:50:33,130 --> 00:50:34,280
very fast access.

1096
00:50:34,280 --> 00:50:37,420
And you have a large amount of
memory that is sitting out

1097
00:50:37,420 --> 00:50:39,260
that has slow access.

1098
00:50:39,260 --> 00:50:42,040
So what you want to give user
feel as if they have a lot of

1099
00:50:42,040 --> 00:50:44,460
memory, everything is
very fast access.

1100
00:50:44,460 --> 00:50:47,070
So hardware does a lot of crazy
things to give you the

1101
00:50:47,070 --> 00:50:50,940
illusion that you have very
fast memory and you have a

1102
00:50:50,940 --> 00:50:52,340
huge amount of that.

1103
00:50:52,340 --> 00:50:54,740
Of course, that's not true,
because you have these caches.

1104
00:50:54,740 --> 00:50:57,070
And what happens is, if
you do things wrong,

1105
00:50:57,070 --> 00:50:59,150
these get to slow down.

1106
00:50:59,150 --> 00:51:02,070
So in the cache system, you have
cache [UNINTELLIGIBLE]

1107
00:51:02,070 --> 00:51:06,610
You have each multicore going to
its own L1 cache that goes

1108
00:51:06,610 --> 00:51:09,670
L2 cache, which goes to L3
cache, which goes to the main

1109
00:51:09,670 --> 00:51:11,150
memory system.

1110
00:51:11,150 --> 00:51:13,890
And you want all of the
data to be here.

1111
00:51:13,890 --> 00:51:15,640
If that's the case, things
run very fast.

1112
00:51:15,640 --> 00:51:16,670
If not, you have to go here.

1113
00:51:16,670 --> 00:51:18,470
It will be a lot more
slow, here, even

1114
00:51:18,470 --> 00:51:20,460
slower, here, very slow.

1115
00:51:20,460 --> 00:51:26,600
So the key thing is to get the
data as much as possible here.

1116
00:51:26,600 --> 00:51:28,400
So here's the kind of
cycles, basically.

1117
00:51:28,400 --> 00:51:30,470
One cycle to get here--
three cycles.

1118
00:51:30,470 --> 00:51:32,760
If you go to L2 to L3, you
have to do 14 cycles

1119
00:51:32,760 --> 00:51:33,490
[UNINTELLIGIBLE].

1120
00:51:33,490 --> 00:51:36,440
If you go to L3 to main memory,
you might sometimes

1121
00:51:36,440 --> 00:51:38,830
get hundreds of cycles
delaying here,

1122
00:51:38,830 --> 00:51:40,080
so it's very costly.

1123
00:51:40,080 --> 00:51:41,990

1124
00:51:41,990 --> 00:51:44,800
So the caches are very
temperamental beasts.

1125
00:51:44,800 --> 00:51:48,190
They work beautifully, but when
they don't work, these go

1126
00:51:48,190 --> 00:51:48,880
really haywire.

1127
00:51:48,880 --> 00:51:52,240
So a lot of times, if there's
a performance issue, look at

1128
00:51:52,240 --> 00:51:53,550
the cache miss rate
and say, aha, the

1129
00:51:53,550 --> 00:51:54,830
caches are not behaving.

1130
00:51:54,830 --> 00:51:57,410
And can I help in here?

1131
00:51:57,410 --> 00:51:59,210
So the interesting way
to look at that is

1132
00:51:59,210 --> 00:52:01,460
how much data I touch.

1133
00:52:01,460 --> 00:52:03,350
These graphics didn't show up.

1134
00:52:03,350 --> 00:52:06,190

1135
00:52:06,190 --> 00:52:07,130
Let me explain.

1136
00:52:07,130 --> 00:52:13,070
So if you do matrix multiply,
I am trying to calculate one

1137
00:52:13,070 --> 00:52:14,320
value in here.

1138
00:52:14,320 --> 00:52:19,570

1139
00:52:19,570 --> 00:52:20,880
So this is one value.

1140
00:52:20,880 --> 00:52:22,480
I have to calculate row
and the column.

1141
00:52:22,480 --> 00:52:25,130

1142
00:52:25,130 --> 00:52:27,020
This slide got screwed
up a little bit.

1143
00:52:27,020 --> 00:52:29,905
So just look at this calculating
[? row bar. ?]

1144
00:52:29,905 --> 00:52:33,340
So if I want to calculate a
row of values here, what I

1145
00:52:33,340 --> 00:52:36,570
have to do is I have to get this
row in B, and I have to

1146
00:52:36,570 --> 00:52:41,510
use entire C to calculate one
row of A. That means I have

1147
00:52:41,510 --> 00:52:45,700
touched this much data items
just to get one row of A. So

1148
00:52:45,700 --> 00:52:49,000
that means I am calculating
1,024 values of A. I am

1149
00:52:49,000 --> 00:52:52,370
touching this much data
items to get that.

1150
00:52:52,370 --> 00:52:54,920
On the other hand, if I do block
matrix multiply, that

1151
00:52:54,920 --> 00:52:58,260
means I calculate, basically,
a block in here.

1152
00:52:58,260 --> 00:52:59,940
This block is also 32 by 32.

1153
00:52:59,940 --> 00:53:03,870
It has 1,024 elements in here.

1154
00:53:03,870 --> 00:53:06,230
But I only need a block
here and block here.

1155
00:53:06,230 --> 00:53:09,780
Voila, I only have looked
at 25,000 elements.

1156
00:53:09,780 --> 00:53:13,770
I got the same number of
output calculated.

1157
00:53:13,770 --> 00:53:16,830
But I have only touched much
less elements altogether to

1158
00:53:16,830 --> 00:53:19,700
calculate that.

1159
00:53:19,700 --> 00:53:20,230
Do you see that?

1160
00:53:20,230 --> 00:53:24,110
And nice thing is, if this thing
fits in the cache, I am

1161
00:53:24,110 --> 00:53:26,470
much better than this might
not fit in the cache.

1162
00:53:26,470 --> 00:53:30,510
So I can calculate blocks in
matrix multiply by looking at

1163
00:53:30,510 --> 00:53:33,965
a lot less number of data,
that has a good chance at

1164
00:53:33,965 --> 00:53:35,870
fitting in cache-- and of
course, when you calculate.

1165
00:53:35,870 --> 00:53:37,260
you make sure it
fits in cache--

1166
00:53:37,260 --> 00:53:40,610
than trying to just calculate
row by row.

1167
00:53:40,610 --> 00:53:43,140
So one thing you can do is--

1168
00:53:43,140 --> 00:53:46,950
this is a lot of common
things we do.

1169
00:53:46,950 --> 00:53:48,680
There's many ways to
get same results.

1170
00:53:48,680 --> 00:53:50,950
You can change the
execution order.

1171
00:53:50,950 --> 00:53:51,710
You can change the algorithm.

1172
00:53:51,710 --> 00:53:53,190
You contain data structures.

1173
00:53:53,190 --> 00:53:55,340
And some of them actually
might do

1174
00:53:55,340 --> 00:53:56,080
better than the other.

1175
00:53:56,080 --> 00:53:58,285
And here, we can actually say we
can do the execution of the

1176
00:53:58,285 --> 00:54:01,790
change and make sure we are only
looking at small amount

1177
00:54:01,790 --> 00:54:05,030
of data, you get the same number
of data calculated.

1178
00:54:05,030 --> 00:54:10,350
And of course, sometimes you
have to be careful, because if

1179
00:54:10,350 --> 00:54:12,590
you do things in different
order, you might not get the

1180
00:54:12,590 --> 00:54:13,840
exact same results.

1181
00:54:13,840 --> 00:54:17,260

1182
00:54:17,260 --> 00:54:20,400
So if you are a numerical
[UNINTELLIGIBLE], you say, OK,

1183
00:54:20,400 --> 00:54:22,940
of course, if you're doing A
plus B plus C, what order you

1184
00:54:22,940 --> 00:54:26,330
do will make a difference
[UNINTELLIGIBLE].

1185
00:54:26,330 --> 00:54:28,420
So if you're really careful, you
can't do too many things.

1186
00:54:28,420 --> 00:54:32,250
But most times, you can do
some of these things.

1187
00:54:32,250 --> 00:54:35,070
And in fact, in our final
thing, we might say--

1188
00:54:35,070 --> 00:54:35,920
the final project--

1189
00:54:35,920 --> 00:54:38,580
we will give you something
that you can change the

1190
00:54:38,580 --> 00:54:41,920
algorithm as long as the outcome
of the [? image ?]

1191
00:54:41,920 --> 00:54:43,206
is the same.

1192
00:54:43,206 --> 00:54:44,910
If you look at the [? image, ?]
and the

1193
00:54:44,910 --> 00:54:46,820
[? image ?] outcome
is the same, you

1194
00:54:46,820 --> 00:54:47,910
can go change things.

1195
00:54:47,910 --> 00:54:52,020
So a lot of times, you might
have a lot of freedom to

1196
00:54:52,020 --> 00:54:57,560
change the amount of errors you
get, the precision, and,

1197
00:54:57,560 --> 00:55:00,780
in fact, doing that, that you
can actually get a lot of good

1198
00:55:00,780 --> 00:55:01,810
performance.

1199
00:55:01,810 --> 00:55:03,930
So in here, we are not trying
to change the matrix

1200
00:55:03,930 --> 00:55:04,690
multiply, of course.

1201
00:55:04,690 --> 00:55:07,750
But we are going to change the
way order of operations is

1202
00:55:07,750 --> 00:55:08,160
being done.

1203
00:55:08,160 --> 00:55:10,310
So some people say you
can't do that.

1204
00:55:10,310 --> 00:55:12,040
It would make matrix
multiply different.

1205
00:55:12,040 --> 00:55:14,420
But in most cases, it's OK.

1206
00:55:14,420 --> 00:55:17,690
So what we have done is
done what we call

1207
00:55:17,690 --> 00:55:18,800
block matrix multiply.

1208
00:55:18,800 --> 00:55:20,240
So instead of calculating
rows, you

1209
00:55:20,240 --> 00:55:21,525
calculate blocks at a time.

1210
00:55:21,525 --> 00:55:23,400
So calculating each block,
you only need a

1211
00:55:23,400 --> 00:55:24,240
few amount of data.

1212
00:55:24,240 --> 00:55:27,340
So this is block matrix
multiply.

1213
00:55:27,340 --> 00:55:32,180
And voila, by doing block matrix
multiply, I got even

1214
00:55:32,180 --> 00:55:34,510
1.7x performance
gain from that.

1215
00:55:34,510 --> 00:55:35,680
Now, it's very interesting.

1216
00:55:35,680 --> 00:55:38,690
Now with each block cycle, I
am doing two operations.

1217
00:55:38,690 --> 00:55:39,580
So my [? superscale ?]

1218
00:55:39,580 --> 00:55:40,780
is finally coming to being.

1219
00:55:40,780 --> 00:55:41,900
So every clock cycle,
I am doing

1220
00:55:41,900 --> 00:55:44,880
multiple operations here.

1221
00:55:44,880 --> 00:55:46,860
So this is very interesting,
the number here.

1222
00:55:46,860 --> 00:55:50,540
So now when you look at what
happens here, I am actually

1223
00:55:50,540 --> 00:55:53,730
executing a little bit more
instructions here.

1224
00:55:53,730 --> 00:55:54,350
Why?

1225
00:55:54,350 --> 00:55:55,890
Because I did block
matrix multiply.

1226
00:55:55,890 --> 00:55:58,020
That means I am doing a little
bit more overhead in here.

1227
00:55:58,020 --> 00:55:59,580
I have more loops in here.

1228
00:55:59,580 --> 00:56:00,510
My inner loops are small.

1229
00:56:00,510 --> 00:56:03,820
So I'm doing more instructions
to do execute there.

1230
00:56:03,820 --> 00:56:07,160
So I'm doing more instructions,
but I have a

1231
00:56:07,160 --> 00:56:10,980
huge cache performance gain.

1232
00:56:10,980 --> 00:56:14,360
My cache miss rate now going
down to very, very little--

1233
00:56:14,360 --> 00:56:16,160
0.02%.

1234
00:56:16,160 --> 00:56:18,720
And I have an 8x improvement in
cache, and that means I got

1235
00:56:18,720 --> 00:56:23,420
a 3x improvement in
my CPI in here.

1236
00:56:23,420 --> 00:56:24,350
That's very nice.

1237
00:56:24,350 --> 00:56:27,210
That means even though I'm doing
a little bit more work,

1238
00:56:27,210 --> 00:56:30,870
my cache behavior really, really
helped me in here.

1239
00:56:30,870 --> 00:56:35,940
So kind of says why I am getting
this performance gain.

1240
00:56:35,940 --> 00:56:38,250
So what else can we do?

1241
00:56:38,250 --> 00:56:42,270
We can go look at trying to do a
lot more crazy optimizations

1242
00:56:42,270 --> 00:56:44,690
because if you look at the
modern Intel, they have this

1243
00:56:44,690 --> 00:56:45,980
thing called [? retro ?]
instructions.

1244
00:56:45,980 --> 00:56:49,040
We'll get a little bit more
detail into that later.

1245
00:56:49,040 --> 00:56:57,120
And what we can do sometimes is
you can nudge the compiler

1246
00:56:57,120 --> 00:56:59,830
to take advantage of
these instructions.

1247
00:56:59,830 --> 00:57:02,560
Of course, if nothing helps, you
can actually go write to

1248
00:57:02,560 --> 00:57:03,420
assembly [UNINTELLIGIBLE].

1249
00:57:03,420 --> 00:57:06,500
But hey, I don't want you guys
to write assembly code, at

1250
00:57:06,500 --> 00:57:09,070
least in this class, even though
you can get really good

1251
00:57:09,070 --> 00:57:10,170
performance sometimes.

1252
00:57:10,170 --> 00:57:12,750
But you can kind of give enough
hints to compiler and

1253
00:57:12,750 --> 00:57:16,340
pray and kind of
[UNINTELLIGIBLE]

1254
00:57:16,340 --> 00:57:18,600
there's no nice way on these
things, and hope for the

1255
00:57:18,600 --> 00:57:19,892
compiler to do the
right thing.

1256
00:57:19,892 --> 00:57:22,460

1257
00:57:22,460 --> 00:57:25,800
So in here, we can play
with, a lot of

1258
00:57:25,800 --> 00:57:27,720
times, compiler flags.

1259
00:57:27,720 --> 00:57:29,650
And this is not exact size.

1260
00:57:29,650 --> 00:57:32,250
We keep trying those things, and
sometimes we can even get

1261
00:57:32,250 --> 00:57:35,430
information back and say, OK,
what did the compiler do?

1262
00:57:35,430 --> 00:57:37,900
It might give you some good
hints and say, I couldn't do

1263
00:57:37,900 --> 00:57:40,750
something because of this
part of the code.

1264
00:57:40,750 --> 00:57:44,785
And worst comes to worst, you
can actually generate assembly

1265
00:57:44,785 --> 00:57:47,500
and stare at it.

1266
00:57:47,500 --> 00:57:49,330
And sometimes in this class,
you might actually want to

1267
00:57:49,330 --> 00:57:53,770
stare at assembly and see
what goes on in here.

1268
00:57:53,770 --> 00:57:55,140
So here is the assembly
for here.

1269
00:57:55,140 --> 00:57:58,710
And this is my loop
body in here.

1270
00:57:58,710 --> 00:58:01,250
And then what I can see, I'm not
going to go through that.

1271
00:58:01,250 --> 00:58:06,180
Everything is converted
into SSE instructions.

1272
00:58:06,180 --> 00:58:08,930
At least I know what SSE
instructions mean, so this

1273
00:58:08,930 --> 00:58:12,190
seems to be doing well,
so I'm happy.

1274
00:58:12,190 --> 00:58:15,330
And in fact, I'm really happy
because I got a 2.8x

1275
00:58:15,330 --> 00:58:17,000
improvement by doing this.

1276
00:58:17,000 --> 00:58:19,190
Now what I am actually doing
each clock cycle about

1277
00:58:19,190 --> 00:58:19,650
[? five of my ?]

1278
00:58:19,650 --> 00:58:20,410
operations.

1279
00:58:20,410 --> 00:58:22,120
So I'm going really
fast in here.

1280
00:58:22,120 --> 00:58:26,950
So now, incidentally, I'm
running my program about

1281
00:58:26,950 --> 00:58:30,360
30,000x faster than
when I started.

1282
00:58:30,360 --> 00:58:32,050
So if you haven't noticed,
these things

1283
00:58:32,050 --> 00:58:34,960
slowly add up in here.

1284
00:58:34,960 --> 00:58:37,390
And now, if we look at what's
going on here, it's very

1285
00:58:37,390 --> 00:58:38,910
interesting.

1286
00:58:38,910 --> 00:58:42,970
What I have done is actually, my
each instruction is now CPI

1287
00:58:42,970 --> 00:58:44,280
has gone down.

1288
00:58:44,280 --> 00:58:46,890
That means each instruction is
running slower because I'm

1289
00:58:46,890 --> 00:58:49,130
doing these SSE instructions.

1290
00:58:49,130 --> 00:58:53,680
And in fact, my cache miss
rate has also gone up.

1291
00:58:53,680 --> 00:58:57,100
But I am doing a lot less
instructions, because SSE

1292
00:58:57,100 --> 00:59:00,010
means in one instruction,
I am doing a vector.

1293
00:59:00,010 --> 00:59:03,700
So I'm actually getting four
operations done in one

1294
00:59:03,700 --> 00:59:04,130
instruction.

1295
00:59:04,130 --> 00:59:06,620
So the number of instructions
I execute has

1296
00:59:06,620 --> 00:59:07,960
drastically gone down.

1297
00:59:07,960 --> 00:59:09,920
So even though my instructions
are running a little bit

1298
00:59:09,920 --> 00:59:12,600
slower, I can basically amortize
that, and I get good

1299
00:59:12,600 --> 00:59:14,970
performance in here.

1300
00:59:14,970 --> 00:59:16,510
And of course, this one says,
OK, now 88% of the

1301
00:59:16,510 --> 00:59:17,790
instructions are SSEs.

1302
00:59:17,790 --> 00:59:18,980
So I have [UNINTELLIGIBLE]

1303
00:59:18,980 --> 00:59:22,850
most of the things to run
on vector mode in here.

1304
00:59:22,850 --> 00:59:25,830
And of course, you can keep
doing more and more and more.

1305
00:59:25,830 --> 00:59:28,430
And there are these four things
[UNINTELLIGIBLE] matrix

1306
00:59:28,430 --> 00:59:31,560
multiply, there are people who
spend their entire lifetime

1307
00:59:31,560 --> 00:59:33,050
trying to run it faster.

1308
00:59:33,050 --> 00:59:35,310
And they have libraries like
that, and this thing called a

1309
00:59:35,310 --> 00:59:40,180
BLAS library and this thing
called Intel Math Kernel.

1310
00:59:40,180 --> 00:59:42,790
So they have done a lot more
other things, like

1311
00:59:42,790 --> 00:59:45,610
pre-fetching, getting
everything right.

1312
00:59:45,610 --> 00:59:47,950
So here what I do is basically
call their library, don't do

1313
00:59:47,950 --> 00:59:50,980
anything myself, and this
library will do it for me.

1314
00:59:50,980 --> 00:59:54,810
And in fact, that library itself
gave me another 2.7x

1315
00:59:54,810 --> 00:59:57,110
performance gain in there.

1316
00:59:57,110 --> 01:00:02,230
Now I'm actually doing 11
instructions every clock cycle

1317
01:00:02,230 --> 01:00:03,190
by doing that.

1318
01:00:03,190 --> 01:00:05,615
And if I look at it carefully,
what they have done is they

1319
01:00:05,615 --> 01:00:09,995
are also very into SSE heavy, so
they are running almost the

1320
01:00:09,995 --> 01:00:11,400
same amount of instructions.

1321
01:00:11,400 --> 01:00:14,240
But they managed to actually
get, again,

1322
01:00:14,240 --> 01:00:16,570
a better miss rate.

1323
01:00:16,570 --> 01:00:19,310
They probably figured out some
stuff in here, things like

1324
01:00:19,310 --> 01:00:20,800
pre-fetching instructions.

1325
01:00:20,800 --> 01:00:23,290
And that means they actually
brought down the CPI back to

1326
01:00:23,290 --> 01:00:24,590
where it was before.

1327
01:00:24,590 --> 01:00:28,370
And that's why they gave that
performance [INAUDIBLE].

1328
01:00:28,370 --> 01:00:30,510
And finally, we are
doing a little

1329
01:00:30,510 --> 01:00:33,770
bit of parallel execution.

1330
01:00:33,770 --> 01:00:35,960
So as you know, multicores
are here.

1331
01:00:35,960 --> 01:00:38,220
And you're actually going to
these 12-core machines, which

1332
01:00:38,220 --> 01:00:40,170
is really nice.

1333
01:00:40,170 --> 01:00:41,810
Fastest possible machines
you can get your

1334
01:00:41,810 --> 01:00:42,980
hands on these days.

1335
01:00:42,980 --> 01:00:46,920
And it's kind of fun
to work with them.

1336
01:00:46,920 --> 01:00:52,620
And we can basically get
concurrency for parallel

1337
01:00:52,620 --> 01:00:53,140
performance.

1338
01:00:53,140 --> 01:00:56,410
And a large part of this class
at the end is trying to get

1339
01:00:56,410 --> 01:00:58,290
good parallel performance.

1340
01:00:58,290 --> 01:00:59,880
But there are a lot of issues
with parallelism.

1341
01:00:59,880 --> 01:01:02,270
I will go a couple
of issues now.

1342
01:01:02,270 --> 01:01:05,910
Of course, when we get to that
part, we will go in detail.

1343
01:01:05,910 --> 01:01:08,790
One thing is called
Amdahl's Law.

1344
01:01:08,790 --> 01:01:10,920
Because if you're trying to run
something parallel, you

1345
01:01:10,920 --> 01:01:13,030
can't run everything parallel.

1346
01:01:13,030 --> 01:01:15,860
When you start something,
there's part of the code that

1347
01:01:15,860 --> 01:01:18,850
basically starts the
computation.

1348
01:01:18,850 --> 01:01:21,580
And then there are the parts you
can run parallel, and then

1349
01:01:21,580 --> 01:01:23,670
other parts you have
to also wind down,

1350
01:01:23,670 --> 01:01:25,690
measure, things like that.

1351
01:01:25,690 --> 01:01:29,620
So what Amdahl's Law says is,
even if you have an infinite

1352
01:01:29,620 --> 01:01:33,660
number of processors, the
maximum speedup you can get is

1353
01:01:33,660 --> 01:01:36,200
this-- this the part of the code
that are sequential and

1354
01:01:36,200 --> 01:01:38,100
[? parts of ?] the code
are parallel.

1355
01:01:38,100 --> 01:01:42,640
That means if you have only
90% of the code can be

1356
01:01:42,640 --> 01:01:45,570
parallelized, the maximum
speedup you can get is 10,

1357
01:01:45,570 --> 01:01:47,850
even if you have an infinite
number of processors.

1358
01:01:47,850 --> 01:01:50,110
So 99% of the code is parallel,
the maximum speedup

1359
01:01:50,110 --> 01:01:51,550
you can get is 100.

1360
01:01:51,550 --> 01:01:55,030
So this is very interesting,
because even if you have huge

1361
01:01:55,030 --> 01:01:59,090
[UNINTELLIGIBLE], if your code
has a certain amount of

1362
01:01:59,090 --> 01:02:04,030
sequential part, there is a
limit of what you can do.

1363
01:02:04,030 --> 01:02:04,740
[UNINTELLIGIBLE]

1364
01:02:04,740 --> 01:02:06,440
also think of load balancing.

1365
01:02:06,440 --> 01:02:09,320
That means when you get
parallel, what you are

1366
01:02:09,320 --> 01:02:12,890
assuming is all processes
are busy all the time.

1367
01:02:12,890 --> 01:02:14,720
That's very hard to achieve.

1368
01:02:14,720 --> 01:02:16,100
Sometimes some processes
are busy.

1369
01:02:16,100 --> 01:02:17,490
Sometimes others are not.

1370
01:02:17,490 --> 01:02:19,050
Sometimes something's
finished early.

1371
01:02:19,050 --> 01:02:21,550
So if that happens, even though
you have parallels, you

1372
01:02:21,550 --> 01:02:25,740
might not get really good
performance in here.

1373
01:02:25,740 --> 01:02:27,350
And then, of course, there's a
thing called granularity of

1374
01:02:27,350 --> 01:02:27,960
parallelism.

1375
01:02:27,960 --> 01:02:29,440
Normally, what happens in
programming, you run

1376
01:02:29,440 --> 01:02:31,540
sequential part, you run
parallel part, you run

1377
01:02:31,540 --> 01:02:33,090
sequential parallel part.

1378
01:02:33,090 --> 01:02:35,070
To start parallelism
is a big task.

1379
01:02:35,070 --> 01:02:37,720
You have to tell all the other
processors, OK, look, you guys

1380
01:02:37,720 --> 01:02:38,800
have some work.

1381
01:02:38,800 --> 01:02:41,960
Go do that, and finish
that, and come back.

1382
01:02:41,960 --> 01:02:44,270
And that itself is
very expensive.

1383
01:02:44,270 --> 01:02:46,760
And the work you give
is very small.

1384
01:02:46,760 --> 01:02:47,500
You give the work.

1385
01:02:47,500 --> 01:02:50,570
The start-up and tear-down cost
of parallelism itself

1386
01:02:50,570 --> 01:02:53,330
might be too much than
the speedup you get.

1387
01:02:53,330 --> 01:02:57,080
In fact, you can make programs
parallel and slow them down

1388
01:02:57,080 --> 01:02:58,410
really drastically.

1389
01:02:58,410 --> 01:03:00,680
So you have to be very careful
in that, because you can say,

1390
01:03:00,680 --> 01:03:02,400
oh, I'm running in parallel,
but it's running slow.

1391
01:03:02,400 --> 01:03:03,590
Why?

1392
01:03:03,590 --> 01:03:06,005
Because you might have a
granularity issue, because we

1393
01:03:06,005 --> 01:03:09,470
are giving so little work that
even though you're getting

1394
01:03:09,470 --> 01:03:11,220
parallelism, you might not get
that much performance.

1395
01:03:11,220 --> 01:03:13,840

1396
01:03:13,840 --> 01:03:18,060
So if you look at matrix
multiply, what happens here is

1397
01:03:18,060 --> 01:03:21,460
you can divide the matrix into
two different parts in here

1398
01:03:21,460 --> 01:03:26,270
and calculate as two different
matrix multiplies in a

1399
01:03:26,270 --> 01:03:29,220
different core or a
different process.

1400
01:03:29,220 --> 01:03:31,580
Or in here, I can make it two
or whatever in matrix

1401
01:03:31,580 --> 01:03:32,650
multiplies to do that.

1402
01:03:32,650 --> 01:03:34,360
So this is nicely
parallelized.

1403
01:03:34,360 --> 01:03:36,125
No big issue in here.

1404
01:03:36,125 --> 01:03:40,030
And when I parallelize, I got
rather 3.5x performance

1405
01:03:40,030 --> 01:03:41,130
improvement.

1406
01:03:41,130 --> 01:03:46,120
Of course, I could have gotten
more here, but because I had

1407
01:03:46,120 --> 01:03:49,070
to start here, since I didn't
want to wait for a week for

1408
01:03:49,070 --> 01:03:51,910
this to finish, I had to come up
with a smaller matrix size.

1409
01:03:51,910 --> 01:03:55,270
But now, when I get here,
this is so fast, my

1410
01:03:55,270 --> 01:03:57,080
granularity is too small.

1411
01:03:57,080 --> 01:03:58,620
So that's why I didn't get--

1412
01:03:58,620 --> 01:04:00,790
because this was an eight-core
machine, I didn't get 8x

1413
01:04:00,790 --> 01:04:03,690
because the minute I get
parallelism, it just is over

1414
01:04:03,690 --> 01:04:06,550
because it's running
so fast in here.

1415
01:04:06,550 --> 01:04:08,800
Now, at this point, using eight
cores, I am actually

1416
01:04:08,800 --> 01:04:09,510
getting 36.

1417
01:04:09,510 --> 01:04:12,470
So this is not that interesting,
because this is

1418
01:04:12,470 --> 01:04:15,825
not running on one core, every
clock cycle, I am trying to

1419
01:04:15,825 --> 01:04:18,580
get 36 operations done.

1420
01:04:18,580 --> 01:04:25,480
So the interesting thing to
follow in here is that this

1421
01:04:25,480 --> 01:04:28,610
example might be somewhat
contrived, because I started

1422
01:04:28,610 --> 01:04:31,380
with this immutable-type
matrix multiply.

1423
01:04:31,380 --> 01:04:35,180
Hopefully, you and your
colleagues who haven't taken

1424
01:04:35,180 --> 01:04:38,060
this class probably
won't go that far.

1425
01:04:38,060 --> 01:04:42,220
But even if you start somewhere
here, still, there's

1426
01:04:42,220 --> 01:04:45,610
a huge gain to be made.

1427
01:04:45,610 --> 01:04:49,450
So to put this in perspective,
OK, so if you look at here,

1428
01:04:49,450 --> 01:04:55,310
even if I start here somewhere
out in C, I already got 131x

1429
01:04:55,310 --> 01:04:56,930
performance improvement.

1430
01:04:56,930 --> 01:04:58,880
That's huge.

1431
01:04:58,880 --> 01:05:00,980
So that means in this
class, we are not

1432
01:05:00,980 --> 01:05:02,080
talking about small things.

1433
01:05:02,080 --> 01:05:06,810
And in fact, this is why I had
to go log scale in doing

1434
01:05:06,810 --> 01:05:10,050
absolute grades, because we
had people who figured out

1435
01:05:10,050 --> 01:05:13,160
exactly how, in some problems,
to change data structures,

1436
01:05:13,160 --> 01:05:16,080
change execution ordering, do
some really cool thing, get

1437
01:05:16,080 --> 01:05:18,050
1,000x performance
improvement.

1438
01:05:18,050 --> 01:05:20,100
And there are other people who
did little bit things and got

1439
01:05:20,100 --> 01:05:22,630
2x improvement.

1440
01:05:22,630 --> 01:05:25,890
So the fact that what you can
do the best here can be

1441
01:05:25,890 --> 01:05:28,050
really, really high.

1442
01:05:28,050 --> 01:05:30,580
So that's why we actually
started giving you the second

1443
01:05:30,580 --> 01:05:34,330
chance to also go basically see,
if you missed a critical

1444
01:05:34,330 --> 01:05:38,110
point, to go back
and learn that.

1445
01:05:38,110 --> 01:05:40,820
So to put this in
perspective--

1446
01:05:40,820 --> 01:05:44,230
so in summary, what
we have done is--

1447
01:05:44,230 --> 01:05:47,850
matrix multiply might
be an exception.

1448
01:05:47,850 --> 01:05:50,565
Every time you go, you might
not get that much, but in

1449
01:05:50,565 --> 01:05:54,080
here, going from this very
software-engineered, nice

1450
01:05:54,080 --> 01:05:54,940
[UNINTELLIGIBLE]

1451
01:05:54,940 --> 01:06:00,890
to a BLAS [UNINTELLIGIBLE],
I got a 296,000 times

1452
01:06:00,890 --> 01:06:01,960
performance improvement.

1453
01:06:01,960 --> 01:06:04,890
This is huge.

1454
01:06:04,890 --> 01:06:08,100
If you put this in perspective,
if you look at

1455
01:06:08,100 --> 01:06:10,920
miles per gallon between
these two-- a

1456
01:06:10,920 --> 01:06:13,285
supertanker and this scooter--

1457
01:06:13,285 --> 01:06:16,520
it's only 14,000x, basically.

1458
01:06:16,520 --> 01:06:19,900
So what that means is you have
to have 20 supertankers to

1459
01:06:19,900 --> 01:06:22,000
basically match this
much improvement.

1460
01:06:22,000 --> 01:06:24,020
So if somebody can tell you,
look, I can't get a

1461
01:06:24,020 --> 01:06:27,000
supertanker to run at the same
miles per gallon as your

1462
01:06:27,000 --> 01:06:28,840
scooter, that would
be revolutionary.

1463
01:06:28,840 --> 01:06:31,800

1464
01:06:31,800 --> 01:06:34,830
But you can do something similar
in your computer.

1465
01:06:34,830 --> 01:06:37,390
And in fact, if you look at
these large data centers, the

1466
01:06:37,390 --> 01:06:39,630
computers are actually
burning a lot of

1467
01:06:39,630 --> 01:06:42,050
energy, a lot of cost.

1468
01:06:42,050 --> 01:06:44,550
And in fact, if you can reduce
that, that can have impact.

1469
01:06:44,550 --> 01:06:48,590
So what you get out of this
class is to think through

1470
01:06:48,590 --> 01:06:50,840
these programs this way.

1471
01:06:50,840 --> 01:06:52,890
Learn about correctness before
[UNINTELLIGIBLE]

1472
01:06:52,890 --> 01:06:56,840
6.170, you hopefully are good
at writing correct programs.

1473
01:06:56,840 --> 01:06:58,840
But that's not good enough
in a lot of times.

1474
01:06:58,840 --> 01:07:01,320
And how to think about programs,
how to make it run

1475
01:07:01,320 --> 01:07:04,080
faster, there are multiple
[UNINTELLIGIBLE] necessary.

1476
01:07:04,080 --> 01:07:06,880
So for example, in here, it's
basically direct energy.

1477
01:07:06,880 --> 01:07:13,720
So can you get the data center
to burn 10% less energy,

1478
01:07:13,720 --> 01:07:14,970
that's huge.

1479
01:07:14,970 --> 01:07:17,010

1480
01:07:17,010 --> 01:07:21,270
For example, if you look at
today's supercomputing

1481
01:07:21,270 --> 01:07:26,410
centers, they spend about $10
million a year on energy, just

1482
01:07:26,410 --> 01:07:27,360
for the power grid.

1483
01:07:27,360 --> 01:07:29,010
So if you can say, I'm
[? playing with ?]

1484
01:07:29,010 --> 01:07:31,060
10%, that's huge.

1485
01:07:31,060 --> 01:07:38,140
Other thing, one thing Charles
says a lot of times is, what

1486
01:07:38,140 --> 01:07:42,730
performance gives you is
currency, because performance

1487
01:07:42,730 --> 01:07:45,800
itself might not be that
directly useful, but it gives

1488
01:07:45,800 --> 01:07:48,330
you currency to buy
something else.

1489
01:07:48,330 --> 01:07:49,030
What it is?

1490
01:07:49,030 --> 01:07:51,890
So assume you have a
nice application.

1491
01:07:51,890 --> 01:07:54,620
Your GUI is doing OK.

1492
01:07:54,620 --> 01:07:58,630
It's up to the point that people
don't feel stalled.

1493
01:07:58,630 --> 01:08:01,160
But if you want to do something
addition, if you add

1494
01:08:01,160 --> 01:08:04,420
a feature, you don't want to
slow down the program too

1495
01:08:04,420 --> 01:08:06,630
much, because your users
are going to complain.

1496
01:08:06,630 --> 01:08:08,710
So what performance gives you
is, if you improve the

1497
01:08:08,710 --> 01:08:11,310
performance, it gives you
currency, the ability to buy

1498
01:08:11,310 --> 01:08:13,530
some additional features
into your system.

1499
01:08:13,530 --> 01:08:17,029
Or you have a system that you
created a start up, and you

1500
01:08:17,029 --> 01:08:19,250
created this nice server
somewhere.

1501
01:08:19,250 --> 01:08:21,850
Suddenly, it became very
popular, and a million people

1502
01:08:21,850 --> 01:08:23,829
want to use your software.

1503
01:08:23,829 --> 01:08:27,340

1504
01:08:27,340 --> 01:08:30,080
And the software might not
be ready to do that.

1505
01:08:30,080 --> 01:08:32,680
It might just crash and burn, or
you might have to pay a lot

1506
01:08:32,680 --> 01:08:36,569
of money for Amazon for
the [? cycles, ?]

1507
01:08:36,569 --> 01:08:39,500
so by actually making programs
faster, you can make it

1508
01:08:39,500 --> 01:08:41,859
scalable, you can make
it efficient.

1509
01:08:41,859 --> 01:08:45,970
So this gives you the ability,
this currency, to go do a lot

1510
01:08:45,970 --> 01:08:48,410
more other interesting things
to your software because now

1511
01:08:48,410 --> 01:08:49,600
you can run it faster.

1512
01:08:49,600 --> 01:08:52,870
It gives you this margin
to play with.

1513
01:08:52,870 --> 01:08:55,500
So in this class, we are
going to learn a

1514
01:08:55,500 --> 01:08:56,370
lot about these things.

1515
01:08:56,370 --> 01:09:00,319
We are going to learn about
architecture, how looking at

1516
01:09:00,319 --> 01:09:02,560
the architectural issues,
how to identify when

1517
01:09:02,560 --> 01:09:03,990
something is wrong.

1518
01:09:03,990 --> 01:09:05,670
We are going to look at
algorithmic issues in here,

1519
01:09:05,670 --> 01:09:06,950
how to look at things.

1520
01:09:06,950 --> 01:09:08,439
We are looking at architectural
things like

1521
01:09:08,439 --> 01:09:10,700
memory systems, parallelism.

1522
01:09:10,700 --> 01:09:14,560
And also we will talk about
all the tools, so you will

1523
01:09:14,560 --> 01:09:15,970
actually know something
has gone wrong.

1524
01:09:15,970 --> 01:09:19,399
Because today, you run the
program, it runs, it runs

1525
01:09:19,399 --> 01:09:21,609
correctly, is it good
enough or not?

1526
01:09:21,609 --> 01:09:22,970
How do you identify that?

1527
01:09:22,970 --> 01:09:25,569
That's, I think, the very
interesting place to start.

1528
01:09:25,569 --> 01:09:29,450
So that's all I have today.

1529
01:09:29,450 --> 01:09:32,710
Make sure you get a Project 0.

1530
01:09:32,710 --> 01:09:37,720
And especially people who
haven't done C, we will--

1531
01:09:37,720 --> 01:09:39,692
Charles, are we going to
do a C remediation?

1532
01:09:39,692 --> 01:09:42,479

1533
01:09:42,479 --> 01:09:44,420
We did last time, didn't we?

1534
01:09:44,420 --> 01:09:46,399
OK, guys, one more thing.

1535
01:09:46,399 --> 01:09:50,380
Last year, we did an evening
C remediation class.

1536
01:09:50,380 --> 01:09:53,510
So if you know Java, if you want
to know C, we will send

1537
01:09:53,510 --> 01:09:58,420
out a time with TAs that will
give you kind of a one-hour

1538
01:09:58,420 --> 01:10:02,400
crash course into C in there.

1539
01:10:02,400 --> 01:10:04,170
And we will send information.

1540
01:10:04,170 --> 01:10:06,370
So this class moves very fast.

1541
01:10:06,370 --> 01:10:10,160
Get on with Project 0, and
see you next week.

1542
01:10:10,160 --> 01:10:11,410